Assignment 2: SEQUENCE ALIGNMENT

Please, do not solve before officially presented in class!
  • 15 points (+2 extra points for gap extension penalty implementation (optional))
  • Deadline: Wednesday, March 25th, 2020, 23:59
  • Late submission penalty: -1 point per day but no more than 12 points
  • Submit to BRUTE.
  • Work individually.
  • Submit
    • your source codes in a language of your choice
    • 2 test cases
      • test1.in for global alignment; expected result in test1.out
      • test2.in and test2.out for local alignment
      • The test cases must be non-trivial (no empty strings, etc.) and must not use sequences provided in the sample solution.
      • Provide the sequences (and distance matrix) in your submission as well. If you modify the distance matrix, please use a different name.
    • bash script compile.sh that compiles your codes
    • a wrapper bash script alignment.sh that calculates the alignment
    • your source codes must compile either
      • on Ubuntu machines in the lab, or
      • on a freshly installed Ubuntu machine (if this is the case, provide file packages.txt with a list of packages that need to be installed)
    • An example of submission may be found here (Java) and here (Python).
  • Should you have any questions, please email petr.rysavy@fel.cvut.cz.
  • You can assume that the input is correct; the algorithm won't encounter any unknown parameters. Therefore you do not need to check for those.

Needleman-Wunsch and Smith-Waterman algorithm

Implement the pairwise global alignment algorithm (Needleman-Wunch) and the pairwise local alignment algorithm (Smith-Waterman) in an arbitrary programming language (e.g., Python, Perl, Java). Required arguments are two input sequences in a FASTA format, a scoring matrix in CSV format (e.g., blosum62 file), and a gap penalization.

Program alignment.sh should accept the following arguments:

argument meaning
-g/-l global/local pairwise alignment
-s1 path to the first sequence
-s2 path to the second sequence
-e path to the score matrix in CSV format
-p gap penalization
-pe gap extension penalization (optional)

Input and output

bash

./alignment.sh -g -s1 A0PQ23.fasta -s2 Q9CD83.fasta -e blosum62.csv -p 4
Runs the Needleman-Wunch algorithm with two input sequences A0PQ23.fasta and Q9CD83.fasta, with the score matrix defined in the blosum62.csv file, and with the gap penalization equal $4$.

bash

./alignment.sh -l -s1 EU078679.fasta -s2 CH954156.fasta -e blastmatrix.csv -p 4
Runs the Smith-Waterman algorithm with two input sequences EU078679.fasta and CH954156.fasta, with the score matrix defined in the blastmatrix.csv file, and with the gap penalization equal $4$.

The final alignment and the corresponding score print to STDOUT on three lines containing:

  • Alignment of the first sequence
  • Alignment of the second sequence
  • Score

The result of the second call would be:

TTGACAGTACATAG
TTGA-­AGTTTGTAG
34

Scoring

  • 6 points for global alignment
  • 6 points for local alignment
  • 1 point for wrapper scripts
  • 1 point for global alignment test case
  • 1 point for local alignment test case
  • 2 extra points if you implement gap extension penalty

Your program needs to run in quadratic time and must not allocate more than a quadratic amount of memory. Submitting a code that does not compile may result in significant point reduction.

External libraries

The alignment implementation must be solely your own work. However, you can you any external library of your choice for the following:

Don't forget that your code needs to compile!

courses/bin/assignments/hw2.txt · Last modified: 2020/02/20 13:28 by rysavpe1