10.3. Longest Common Subsequence

This section discusses recursive and dynamic programming ways of finding longest common subsequence.

A subsequence is a sequence that can be derived from another sequence by deleting some or no elements without changing the order of the remaining elements.

Given a string "ABCDE"

  • Substring: {"A", "BC", "CDE", ...}

  • Subsequence: {all substrings, "AC", "ACE", ...}

  • Not subsequence: {"BA", "DAB", ...}

Longest common subsequence

  • The longest subsequence commonly shared by multiple strings.

  • e.g., “baal” is a LCS of “bilabial” and “balaclava”.

Can there be more than one longest common subsequence?

Application

  • Find the longest common subsequence in DNA (e.g., GAATGTCCTTTCTCTAAGTCCTAAG).

Abstraction

Let us create the abstract class LCS:

public abstract class LCS {
    /**
     * @param a the first string.
     * @param b the second string.
     * @return a longest common sequence of the two specific strings.
     */
    public String solve(String a, String b) {
        return solve(a.toCharArray(), b.toCharArray(), a.length() - 1, b.length() - 1);
    }

    /**
     * @param c the first array of characters.
     * @param d the second array of characters.
     * @param i the index of the character in {@code c} to be compared.
     * @param j the index of the character in {@code d} to be compared.
     * @return a longest common sequence of the two specific strings.
     */
    protected abstract String solve(char[] c, char[] d, int i, int j);
}

Recursion

Let us create the LCSRecursive inheriting LCS:

public class LCSRecursive extends LCS {
    @Override
    protected String solve(char[] c, char[] d, int i, int j) {
        if (i < 0 || j < 0) return "";
        if (c[i] == d[j]) return solve(c, d, i - 1, j - 1) + c[i];

        String c1 = solve(c, d, i - 1, j);
        String d1 = solve(c, d, i, j - 1);
        return (c1.length() > d1.length()) ? c1 : d1;
    }
}
  • L4: no character is left to compare for either string.

  • L5: when two characters match, move onto c[:i-1] and d[:j-1].

  • L7: gets the longest common subsequence between c[:i-1] and d[:j].

  • L8: gets the longest common subsequence between c[:i] and d[:j-1].

  • L9: returns the longest subsequence.

The followings demonstrate a recursive way of finding a LCS between two strings:

Dynamic Programming

Let us create the LCSDynamic class inheriting LCS.

public class LCSDynamic extends LCS {
    @Override
    protected String solve(char[] c, char[] d, int i, int j) {
        return solve(c, d, i, j, createTable(c, d));
    }

    /**
     * @param c the first string.
     * @param d the second string.
     * @return the dynamic table populated by estimating the # of LCSs in the grid of the two specific strings.
     */
    protected int[][] createTable(char[] c, char[] d) {
        final int N = c.length, M = d.length;
        int[][] table = new int[N][M];

        for (int i = 1; i < N; i++)
            for (int j = 1; j < M; j++)
                table[i][j] = (c[i] == d[j]) ? table[i - 1][j - 1] + 1 : Math.max(table[i - 1][j], table[i][j - 1]);

        return table;
    }
}
  • L4: creates a dynamic table and passes it to the solver.

  • L12: the dynamic table is pre-populated before any recursive calls.

The following show the dynamic table populated by the previous example:

Let us define the solve() method:

protected String solve(char[] c, char[] d, int i, int j, int[][] table) {
    if (i < 0 || j < 0) return "";
    if (c[i] == d[j])  return solve(c, d, i - 1, j - 1, table) + c[i];
    
    if (i == 0) return solve(c, d, i, j - 1, table);
    if (j == 0) return solve(c, d, i - 1, j, table);
    return (table[i - 1][j] > table[i][j - 1]) ? solve(c, d, i - 1, j, table) : solve(c, d, i, j - 1, table);
}
  • L2: no character is left to compare for either string.

  • L3: when two characters match, move onto c[:i-1] and d[:j-1].

  • L5: gets the longest common subsequence between c[:i] and d[:j-1].

  • L6: gets the longest common subsequence between c[:i-1] and d[:j].

  • L9: returns the longest subsequence by looking up the values in the dynamic table.

The followings demonstrate how to find a LCS using the dynamic table:

Last updated

©2023 Emory University - All rights reserved