Assignment 2: SEQUENCE ALIGNMENT

  • 15 points (+2 extra points for gap extension penalty implementation (optional))
  • Deadline: Wednesday, March 20th, 2019, 23:59
  • Your score has to be at least 5 points (excluding the late submission penalty).
  • Late submission penalty: -1 point per day
  • Submit to BRUTE.
  • Work individually.
  • Submit
    • your source codes in a language of your choice
    • 2 test cases
      • test1.in for global alignment; expected result in test1.out
      • test2.in and test2.out for local alignment
      • The test cases must be non-trivial (no empty strings, etc.) and must not use sequences provided in the sample solution.
      • Provide the sequences (and distance matrix) in your submission as well. If you modify the distance matrix, please use a different name.
    • bash script compile.sh that compiles your codes
    • a wrapper bash script alignment.sh that calculates the alignment
    • your source codes must compile either
      • on Ubuntu machines in the lab, or
      • on a freshly installed Ubuntu machine (if this is the case, provide file packages.txt with a list of packages that need to be installed)
    • An example of submission may be found here.
  • There were some changes from the last year version. If you believe that I have introduced a mistake into the project specification, please email petr.rysavy@fel.cvut.cz.
  • You can assume that the input is correct; the algorithm won't encounter any unknown parameters. Therefore you do not need to check for those.

Needleman-Wunsch and Smith-Waterman algorithm

Implement the pairwise global alignment algorithm (Needleman-Wunch) and the pairwise local alignment algorithm (Smith-Waterman) in an arbitrary programming language (e.g., Python, Perl, Java). Required arguments are two input sequences in a FASTA format, a scoring matrix in CSV format (e.g., blosum62 file), and a gap penalization.

Program alignment.sh should accept the following arguments:

argument meaning
-g/-l global/local pairwise alignment
-s1 path to the first sequence
-s2 path to the second sequence
-e path to the score matrix in CSV format
-p gap penalization
-pe gap extension penalization (optional)

Input and output

bash

./alignment.sh -g -s1 A0PQ23.fasta -s2 Q9CD83.fasta -e blosum62.csv -p 4
Runs the Needleman-Wunch algorithm with two input sequences A0PQ23.fasta and Q9CD83.fasta, with the score matrix defined in the blosum62.csv file, and with the gap penalization equal $4$.

bash

./alignment.sh -l -s1 EU078679.fasta -s2 CH954156.fasta -e blastmatrix.csv -p 4
Runs the Smith-Waterman algorithm with two input sequences EU078679.fasta and CH954156.fasta, with the score matrix defined in the blastmatrix.csv file, and with the gap penalization equal $4$.

The final alignment and the corresponding score print to STDOUT on three lines containing:

  • Alignment of the first sequence
  • Alignment of the second sequence
  • Score

The result of the second call would be:

TTGACAGTACATAG
TTGA-­AGTTTGTAG
34

Scoring

  • 6 points for global alignment
  • 6 points for local alignment
  • 1 point for wrapper scripts
  • 1 point for global alignment test case
  • 1 point for local alignment test case
  • 2 extra points if you implement gap extension penalty

Your program needs to run in quadratic time and must not allocate more than a quadratic amount of memory. Submitting a code that does not compile may result in significant point reduction.

External libraries

The alignment implementation must be solely your own work. However, you can you any external library of your choice for the following:

Don't forget that your code needs to compile!

courses/bin/assignments/hw2.txt · Last modified: 2019/03/13 14:45 by rysavpe1