====== Assignment 2: SEQUENCE ALIGNMENT ====== * 15 points (+2 extra points for gap extension penalty implementation (optional)) * Deadline: Wednesday, March 20th, 2019, 23:59 * Your score has to be at least 5 points (excluding the late submission penalty). * Late submission penalty: -1 point per day * Submit to [[https://cw.felk.cvut.cz/brute/|BRUTE]]. * Work individually. * Submit * your source codes in a language of your choice * 2 test cases * ''test1.in'' for global alignment; expected result in ''test1.out'' * ''test2.in'' and ''test2.out'' for local alignment * The test cases must be non-trivial (no empty strings, etc.) and must not use sequences provided in the sample solution. * Provide the sequences (and distance matrix) in your submission as well. If you modify the distance matrix, please use a different name. * bash script ''compile.sh'' that compiles your codes * a wrapper bash script ''alignment.sh'' that calculates the alignment * your source codes must compile either * on Ubuntu machines in the lab, or * on a freshly installed Ubuntu machine (if this is the case, provide file ''packages.txt'' with a list of packages that need to be installed) * An example of submission {{courses:bin:assignments:samplesolution.zip|may be found here}}. * There were some changes from the last year version. If you believe that I have introduced a mistake into the project specification, please email [[petr.rysavy@fel.cvut.cz|petr.rysavy@fel.cvut.cz]]. * You can assume that the input is correct; the algorithm won't encounter any unknown parameters. Therefore you do not need to check for those. ===== Needleman-Wunsch and Smith-Waterman algorithm ===== Implement the pairwise global alignment algorithm (Needleman-Wunch) **and** the pairwise local alignment algorithm (Smith-Waterman) in an arbitrary programming language (e.g., Python, Perl, Java). Required arguments are two input sequences in a FASTA format, a scoring matrix in CSV format (e.g., blosum62 file), and a gap penalization. Program ''alignment.sh'' should accept the following arguments: | argument | meaning | | ''-g''/''-l'' | global/local pairwise alignment | | ''-s1'' | path to the first sequence | | ''-s2'' | path to the second sequence | | ''-e'' | path to the score matrix in CSV format | | ''-p'' | gap penalization | | ''-pe'' | gap extension penalization (optional) | ===== Input and output ===== ./alignment.sh -g -s1 A0PQ23.fasta -s2 Q9CD83.fasta -e blosum62.csv -p 4 Runs the Needleman-Wunch algorithm with two input sequences ''A0PQ23.fasta'' and ''Q9CD83.fasta'', with the score matrix defined in the ''blosum62.csv'' file, and with the gap penalization equal $4$. ./alignment.sh -l -s1 EU078679.fasta -s2 CH954156.fasta -e blastmatrix.csv -p 4 Runs the Smith-Waterman algorithm with two input sequences ''EU078679.fasta'' and ''CH954156.fasta'', with the score matrix defined in the ''blastmatrix.csv'' file, and with the gap penalization equal $4$. The final alignment and the corresponding score print to STDOUT on three lines containing: * Alignment of the first sequence * Alignment of the second sequence * Score The result of the second call would be: TTGACAGTACATAG TTGA-­AGTTTGTAG 34 ===== Scoring ===== * 6 points for global alignment * 6 points for local alignment * 1 point for wrapper scripts * 1 point for global alignment test case * 1 point for local alignment test case * 2 extra points if you implement gap extension penalty Your program needs to run in quadratic time and must not allocate more than a quadratic amount of memory. Submitting a code that does not compile may result in significant point reduction. ===== External libraries ===== The alignment implementation must be solely your own work. However, you can you any external library of your choice for the following: * Command line arguments parsing. In Java, [[https://mvnrepository.com/artifact/commons-cli/commons-cli/1.3.1|apache commons cli]] or [[https://github.com/remkop/picocli|picocli]] may be useful. In Python, the [[https://docs.python.org/3.7/howto/argparse.html|argparse]] library is handy. * Scoring matrix parsing. For example, Java: [[https://commons.apache.org/proper/commons-csv/|Apache commons csv]], Python: [[https://docs.python.org/3/library/csv.html|csv]] * FASTA file reading. Java: [[http://jfasta.sourceforge.net/|jFasta]] or [[http://broadinstitute.github.io/picard/|Picard tools]], Python: [[https://biopython.org/wiki/SeqIO|biopython]]. * If you need anything else, ask me in advance. Don't forget that your code needs to compile!