====== Assignment 2: SEQUENCE ALIGNMENT ======
Please, do not solve before officially presented in class!
If you think that this homework is not for you (for example, you already implemented Levenshtein distance several times), you might submit [[courses:bin:assignments:hw2b|BLAST ALIGNMENT]] instead.
* 15 points (+2 extra points for gap extension penalty implementation (optional))
* Deadline: Wednesday, March 30th, 2022, 23:59
* Late submission penalty: -1 point per day but no more than 12 points
* Submit to [[https://cw.felk.cvut.cz/brute/|BRUTE]].
* Work individually.
* Submit
* your source codes in a language of your choice
* 2 test cases
* ''test1.in'' for global alignment; expected result in ''test1.out''
* ''test2.in'' and ''test2.out'' for local alignment
* The test cases must be non-trivial (no empty strings, etc.) and must not use sequences provided in the sample solution.
* Provide the sequences (and distance matrix) in your submission as well. If you modify the distance matrix, please use a different name.
* bash script ''compile.sh'' that compiles your codes
* a wrapper bash script ''alignment.sh'' that calculates the alignment
* your source codes must compile either
* on Ubuntu machines in the lab, or
* on a freshly installed Ubuntu machine (if this is the case, provide file ''packages.txt'' with a list of packages that need to be installed)
* An example of submission {{courses:bin:assignments:samplesolution.zip|may be found here (Java)}} and {{courses:bin:assignments:samplesubmission-python.zip|here (Python)}}.
* Should you have any questions, please email [[petr.rysavy@fel.cvut.cz|petr.rysavy@fel.cvut.cz]].
* You can assume that the input is correct; the algorithm won't encounter any unknown parameters. Therefore you do not need to check for those.
===== Needleman-Wunsch and Smith-Waterman algorithm =====
Implement the pairwise global alignment algorithm (Needleman-Wunch) **and** the pairwise local alignment
algorithm (Smith-Waterman) in an arbitrary programming language (e.g., Python, Perl, Java).
Required arguments are two input sequences in a FASTA format, a scoring matrix in CSV format (e.g.,
blosum62 file), and a gap penalization.
Program ''alignment.sh'' should accept the following arguments:
| argument | meaning |
| ''-g''/''-l'' | global/local pairwise alignment |
| ''-s1'' | path to the first sequence |
| ''-s2'' | path to the second sequence |
| ''-e'' | path to the score matrix in CSV format |
| ''-p'' | gap penalization |
| ''-pe'' | gap extension penalization (optional) |
===== Input and output =====
./alignment.sh -g -s1 A0PQ23.fasta -s2 Q9CD83.fasta -e blosum62.csv -p 4
Runs the Needleman-Wunch algorithm with two input sequences ''A0PQ23.fasta'' and ''Q9CD83.fasta'', with the score matrix defined in the ''blosum62.csv'' file, and with the gap penalization equal $4$.
./alignment.sh -l -s1 EU078679.fasta -s2 CH954156.fasta -e blastmatrix.csv -p 4
Runs the Smith-Waterman algorithm with two input sequences ''EU078679.fasta'' and ''CH954156.fasta'', with the score matrix
defined in the ''blastmatrix.csv'' file, and with the gap penalization equal $4$.
The final alignment and the corresponding score print to STDOUT on three lines containing:
* Alignment of the first sequence
* Alignment of the second sequence
* Score
The result of the second call would be:
TTGACAGTACATAG
TTGA-AGTTTGTAG
34
===== Scoring =====
* 6 points for global alignment
* 6 points for local alignment
* 1 point for wrapper scripts
* 1 point for global alignment test case
* 1 point for local alignment test case
* 2 extra points if you implement gap extension penalty
Your program needs to run in quadratic time and must not allocate more than a quadratic amount of memory. Submitting a code that does not compile may result in significant point reduction.
===== External libraries =====
The alignment implementation must be solely your own work. However, you can you any external library of your choice for the following:
* Command-line arguments parsing. In Java, [[https://mvnrepository.com/artifact/commons-cli/commons-cli/1.3.1|apache commons cli]] or [[https://github.com/remkop/picocli|picocli]] may be useful. In Python, the [[https://docs.python.org/3.7/howto/argparse.html|argparse]] library is handy.
* Scoring matrix parsing. For example, Java: [[https://commons.apache.org/proper/commons-csv/|Apache commons csv]], Python: [[https://docs.python.org/3/library/csv.html|csv]]
* FASTA file reading. Java: [[http://jfasta.sourceforge.net/|jFasta]] or [[http://broadinstitute.github.io/picard/|Picard tools]], Python: [[https://biopython.org/wiki/SeqIO|biopython]].
* Common libraries as [[https://numpy.org/|numpy]].
* If you need anything else, ask me in advance.
Don't forget that your code needs to compile!