Calculate the score of alignment
PRT———EINS
YRNWPSEEN-
Use BLOSUM62 scoring matrix and affine gap penalty with gap opening cost 11 and gap extension cost 1.
[adapted from http://rosalind.info/problems/gaff/ ]
Calculate a local alignment of sequences CGTGAATTCAT and GACTTAC. Consider $\mathrm{gap\_penalty} = -4$, $\mathrm{mismatch\_penalty} = -3$, and $\mathrm{match\_premium} = 5$.
Consider the following alignment of 5 sequences
AAI
SAL
TAL
TAV
AAL
Calculate the BLOSUM substitution matrix. Use scaling factor $\lambda = 0.5$. For an explanation how to calculate the matrix, you can visit this lecture (original source).
[ the example is taken from (original source, not working now) http://www.cs.columbia.edu/4761/assignments/assignment1/reference1.pdf ]
In bioinformatics and statistics, R programming language become popular. CLI can be downloaded from the R-project site . If you prefer GUI, you can install an open-source version of RStudio. Other sources are available as well.
We will use this language to calculate a local and a global alignment of two sequences. First, we will start with validating results we got in the second problem. BLOSUM matrix and other tools are located in library Biostrings, which is part of Bioconductor. To install the library and load it, call
R
source("https://bioconductor.org/biocLite.R")
biocLite("Biostrings")
library(Biostrings)
R
sigma <- nucleotideSubstitutionMatrix(match = 5, mismatch = -3, baseOnly = TRUE)
R
sigma
The two sequences are
R
s1 <- "CGTGAATTCAT" s2 <- "GACTTAC"
R
pairwiseAlignment(s1, s2, substitutionMatrix = sigma, gapOpening = 0, gapExtension = -4, scoreOnly = FALSE, type="local")
gapOpening and gapExtension than we do.
To see help for the pairwiseAlignment function, use
R
help(pairwiseAlignment)
Now we will test on two real-world sequences. Download them from the UniProt database
R
download.file("http://www.uniprot.org/uniprot/Q9CD83.fasta", "Q9CD83.fasta")
download.file("http://www.uniprot.org/uniprot/A0PQ23.fasta", "A0PQ23.fasta")
read.fasta from library 'seqinr can be used. If you did not install this library yet, install it.
R
install.packages("seqinr")
R
library("seqinr")
leprae <- read.fasta(file = "Q9CD83.fasta")
ulcerans <- read.fasta(file = "A0PQ23.fasta")
R
getSequence(leprae) getAnnot(leprae) getName(leprae)
leprae with ulcerans to see the other bacteria. Again, we will need the Biostrings package. Blosum matrix can be loaded using
R
data(BLOSUM50)
R
BLOSUM50
Global alignment can be calculated as
R
pairwiseAlignment(toupper(c2s(leprae[[1]])), toupper(c2s(ulcerans[[1]])), substitutionMatrix = BLOSUM50, gapOpening = -2, gapExtension = -8, scoreOnly = FALSE)
To see more examples, please, check a script from the last year.