Search
Given an alphabet and a set of sequences composed of characters of that alphabet, the goal is to find a supersequence S, i.e. a sequence containing all of the original sequences, of a minimum length. Sequence r is contained in S if and only if all characters of r are present in S in the same order as they appear in r.
Input:
Output: Supersequence S of the minimum length.
s1: ca ag cca cc ta cat c a s2: c gag ccat ccgtaaa g tt g s3: aga acc tgc taaatgc t a ga ---------------------------- Supersequence S: cagagaccatgccgtaaatgcattacga
Supersequence quality S is calculated as
<latex> $f(S) = C(S) + L(S)$, </latex>
where <latex>$C(S)$</latex> is the total number of characters the supersequence S covers, and <latex>$L(S)$</latex> is a contribution for the length of S calculated as
<latex> $L(S) = \frac{SumL - Length(S)}{SumL}$, </latex>
where <latex>$SumL$</latex> is the total number of all characters in the original sequences.
The function is to be maximized.
If solved as a pure black-box optimization problem, one could use information about partial sequences only in the evaluation function. Here, it is allowed to use or analyze partial sequences elsewhere as well, e.g., in recombination operators.