10.3. Longest Common Subsequence
This section discusses recursive and dynamic programming ways of finding longest common subsequence.
A subsequence is a sequence that can be derived from another sequence by deleting some or no elements without changing the order of the remaining elements.
Given a string "ABCDE"
Substring: {"A", "BC", "CDE", ...}
Subsequence: {all substrings, "AC", "ACE", ...}
Not subsequence: {"BA", "DAB", ...}
Longest common subsequence
The longest subsequence commonly shared by multiple strings.
e.g., “baal” is a LCS of “bilabial” and “balaclava”.
Can there be more than one longest common subsequence?
Application
Find the longest common subsequence in DNA (e.g.,
GAATGTCCTTTCTCTAAGTCCTAAG
).
Abstraction
Let us create the abstract class LCS
:
public abstract class LCS {
/**
* @param a the first string.
* @param b the second string.
* @return a longest common sequence of the two specific strings.
*/
public String solve(String a, String b) {
return solve(a.toCharArray(), b.toCharArray(), a.length() - 1, b.length() - 1);
}
/**
* @param c the first array of characters.
* @param d the second array of characters.
* @param i the index of the character in {@code c} to be compared.
* @param j the index of the character in {@code d} to be compared.
* @return a longest common sequence of the two specific strings.
*/
protected abstract String solve(char[] c, char[] d, int i, int j);
}
Recursion
Let us create the LCSRecursive
inheriting LCS
:
public class LCSRecursive extends LCS {
@Override
protected String solve(char[] c, char[] d, int i, int j) {
if (i < 0 || j < 0) return "";
if (c[i] == d[j]) return solve(c, d, i - 1, j - 1) + c[i];
String c1 = solve(c, d, i - 1, j);
String d1 = solve(c, d, i, j - 1);
return (c1.length() > d1.length()) ? c1 : d1;
}
}
L4
: no character is left to compare for either string.L5
: when two characters match, move ontoc[:i-1]
andd[:j-1]
.L7
: gets the longest common subsequence betweenc[:i-1]
andd[:j]
.L8
: gets the longest common subsequence betweenc[:i]
andd[:j-1]
.L9
: returns the longest subsequence.
The followings demonstrate a recursive way of finding a LCS between two strings:
Dynamic Programming
Let us create the LCSDynamic
class inheriting LCS
.
public class LCSDynamic extends LCS {
@Override
protected String solve(char[] c, char[] d, int i, int j) {
return solve(c, d, i, j, createTable(c, d));
}
/**
* @param c the first string.
* @param d the second string.
* @return the dynamic table populated by estimating the # of LCSs in the grid of the two specific strings.
*/
protected int[][] createTable(char[] c, char[] d) {
final int N = c.length, M = d.length;
int[][] table = new int[N][M];
for (int i = 1; i < N; i++)
for (int j = 1; j < M; j++)
table[i][j] = (c[i] == d[j]) ? table[i - 1][j - 1] + 1 : Math.max(table[i - 1][j], table[i][j - 1]);
return table;
}
}
L4
: creates a dynamic table and passes it to the solver.L12
: the dynamic table is pre-populated before any recursive calls.
The following show the dynamic table populated by the previous example:
Let us define the solve()
method:
protected String solve(char[] c, char[] d, int i, int j, int[][] table) {
if (i < 0 || j < 0) return "";
if (c[i] == d[j]) return solve(c, d, i - 1, j - 1, table) + c[i];
if (i == 0) return solve(c, d, i, j - 1, table);
if (j == 0) return solve(c, d, i - 1, j, table);
return (table[i - 1][j] > table[i][j - 1]) ? solve(c, d, i - 1, j, table) : solve(c, d, i, j - 1, table);
}
L2
: no character is left to compare for either string.L3
: when two characters match, move ontoc[:i-1]
andd[:j-1]
.L5
: gets the longest common subsequence betweenc[:i]
andd[:j-1]
.L6
: gets the longest common subsequence betweenc[:i-1]
andd[:j]
.L9
: returns the longest subsequence by looking up the values in the dynamic table.
The followings demonstrate how to find a LCS using the dynamic table:
Last updated
Was this helpful?