====== Tutorial 3 - Sequence alignment, second assignment: SEQUENCE ALIGNMENT ====== ===== Problem 1 - score of an alignment ===== Calculate the score of alignment ''PRT---------EINS''\\ ''YRNWPSEEN-'' Use [[http://rosalind.info/glossary/blosum62/|BLOSUM62 scoring matrix]] and affine gap penalty with gap opening cost 11 and gap extension cost 1. In some tools, the cost for a gap of size 1 is only the //gap opening penalty//, in others, it is //gap opening penalty + gap extension penalty//. In this tutorial, we will follow the notation from the lecture, which corresponds to the first case. [adapted from [[http://rosalind.info/problems/gaff/|http://rosalind.info/problems/gaff/]] ] ==== Problem 2 - local sequence alignment ==== Calculate a local alignment of sequences ''CGTGAATTCAT'' and ''GACTTAC''. Consider $\mathrm{gap\_penalty} = -4$, $\mathrm{mismatch\_penalty} = -3$, and $\mathrm{match\_premium} = 5$. ==== Problem 3 - BLOSUM matrix ==== Consider the following alignment of 5 sequences ''AAI''\\ ''SAL''\\ ''TAL''\\ ''TAV''\\ ''AAL'' Calculate the BLOSUM substitution matrix. Use scaling factor $\lambda = 0.5$. For an explanation how to calculate the matrix, you can visit [[http://www3.cs.stonybrook.edu/~rp/class/549f14/lectures/CSE549-Lec05.pdf | this lecture]]. [ the example is taken from [[http://www.cs.columbia.edu/4761/assignments/assignment1/reference1.pdf | http://www.cs.columbia.edu/4761/assignments/assignment1/reference1.pdf ]] ] ==== 4 - a practical example in R ==== In bioinformatics and statics, **R** programming language become popular. CLI can be downloaded from the [[ https://www.r-project.org/ | R-project site ]]. If you prefer GUI, you can install an open-source version of [[https://www.rstudio.com/products/rstudio/download/ | RStudio]]. Other sources are available as well. === A toy example === We will use this language to calculate a local and a global alignment of two sequences. First, we will start with validating results we got in the second problem. BLOSUM matrix and other tools are located in library Biostrings, which is part of Bioconductor. To install the library and load it, call source("https://bioconductor.org/biocLite.R") biocLite("Biostrings") library(Biostrings) We need a matrix, where a match has value $5$, mismatch $-3$, and gap is $4$. We create the matrix by sigma <- nucleotideSubstitutionMatrix(match = 5, mismatch = -3, baseOnly = TRUE) To visualize the matrix, call simply sigma The two sequences are s1 <- "CGTGAATTCAT" s2 <- "GACTTAC" You can calculate and visualize the local alignment by pairwiseAlignment(s1, s2, substitutionMatrix = sigma, gapOpening = 0, gapExtension = -4, scoreOnly = FALSE, type="local") Note that this function uses a slightly different definition of ''gapOpening'' and ''gapExtension'' than we do. To see help for the ''pairwiseAlignment'' function, use help(pairwiseAlignment) === Two real-world sequences === Now we will test on two real-world sequences. Download them from the UniProt database download.file("http://www.uniprot.org/uniprot/Q9CD83.fasta", "Q9CD83.fasta") download.file("http://www.uniprot.org/uniprot/A0PQ23.fasta", "A0PQ23.fasta") Now we need to load the sequences into R. Function ''read.fasta'' from library '''seqinr'' can be used. If you did not install this library yet, install it. install.packages("seqinr") Now we can load the library and the sequences. library("seqinr") leprae <- read.fasta(file = "Q9CD83.fasta") ulcerans <- read.fasta(file = "A0PQ23.fasta") You can visualize the sequences using getSequence(leprae) getAnnot(leprae) getName(leprae) Replace ''leprae'' with ''ulcerans'' to see the other bacteria. Again, we will need the ''Biostrings'' package. Blosum matrix can be loaded using data(BLOSUM50) and visualized by BLOSUM50 Global alignment can be calculated as pairwiseAlignment(toupper(c2s(leprae[[1]])), toupper(c2s(ulcerans[[1]])), substitutionMatrix = BLOSUM50, gapOpening = -2, gapExtension = -8, scoreOnly = FALSE) To see more examples, please, check {{../alignmentr.zip|a script from the last year}}. ===== 5 - Assignment 2 - implement pairwise alignment ===== Work individually on the {{../bin_assignment2.zip|second programming assignment}}. The deadline is before the sixth lecture on March 29, 2018. Upload in BRUTE.