Search
Calculate the score of alignment
PRT———EINS YRNWPSEEN-
PRT———EINS
YRNWPSEEN-
Use BLOSUM62 scoring matrix and affine gap penalty with gap opening cost 11 and gap extension cost 1.
[adapted from http://rosalind.info/problems/gaff/ ]
Calculate a local alignment of sequences CGTGAATTCAT and GACTTAC. Consider $\mathrm{gap\_penalty} = -4$, $\mathrm{mismatch\_penalty} = -3$, and $\mathrm{match\_premium} = 5$.
CGTGAATTCAT
GACTTAC
Consider the following alignment of 5 sequences
AAI SAL TAL TAV AAL
AAI
SAL
TAL
TAV
AAL
Calculate the BLOSUM substitution matrix. Use scaling factor $\lambda = 0.5$. For an explanation how to calculate the matrix, you can visit this lecture (original source).
[ the example is taken from (original source, not working now) http://www.cs.columbia.edu/4761/assignments/assignment1/reference1.pdf ]
In bioinformatics and statistics, R programming language become popular. CLI can be downloaded from the R-project site . If you prefer GUI, you can install an open-source version of RStudio. Other sources are available as well.
We will use this language to calculate a local and a global alignment of two sequences. First, we will start with validating results we got in the second problem. BLOSUM matrix and other tools are located in library Biostrings, which is part of Bioconductor. To install the library and load it, call
R
source("https://bioconductor.org/biocLite.R") biocLite("Biostrings") library(Biostrings)
sigma <- nucleotideSubstitutionMatrix(match = 5, mismatch = -3, baseOnly = TRUE)
sigma
The two sequences are
s1 <- "CGTGAATTCAT" s2 <- "GACTTAC"
pairwiseAlignment(s1, s2, substitutionMatrix = sigma, gapOpening = 0, gapExtension = -4, scoreOnly = FALSE, type="local")
gapOpening
gapExtension
To see help for the pairwiseAlignment function, use
pairwiseAlignment
help(pairwiseAlignment)
Now we will test on two real-world sequences. Download them from the UniProt database
download.file("http://www.uniprot.org/uniprot/Q9CD83.fasta", "Q9CD83.fasta") download.file("http://www.uniprot.org/uniprot/A0PQ23.fasta", "A0PQ23.fasta")
read.fasta
'seqinr
install.packages("seqinr")
library("seqinr") leprae <- read.fasta(file = "Q9CD83.fasta") ulcerans <- read.fasta(file = "A0PQ23.fasta")
getSequence(leprae) getAnnot(leprae) getName(leprae)
leprae
ulcerans
Biostrings
data(BLOSUM50)
BLOSUM50
Global alignment can be calculated as
pairwiseAlignment(toupper(c2s(leprae[[1]])), toupper(c2s(ulcerans[[1]])), substitutionMatrix = BLOSUM50, gapOpening = -2, gapExtension = -8, scoreOnly = FALSE)
To see more examples, please, check a script from the last year.