Warning

# Tutorial 4 - BLAST, Star Alignment, Clustal Omega

## Problem 1 - Multiple Sequence Alignment Score

Calculate the score of the following alignment

1. using sum-of-pairs method (match is $+4$, mismatch $-2$, indel $-1$ and $s(\_,\_)=0$);
2. using Shannon entropy method.

$$\begin{array}{l} \mathtt{MQPILL\_G} \\ \mathtt{MLR\_LL\_G} \\ \mathtt{MK\_ILLL\_} \\ \mathtt{MPPVLLI\_} \end{array}$$

Calculate the consensus sequence.

## Problem 2 - STAR Alignment

Calculate multiple sequence alignment using the star approach.

\begin{aligned} s_1 &= \mathtt{CCTGCTGCAG} \\ s_2 &= \mathtt{GATGTGCCG} \\ s_3 &= \mathtt{GATGTGCAG} \\ s_4 &= \mathtt{CCGCTAGCAG} \\ s_5 &= \mathtt{CCTGTAGG} \end{aligned} Match is for $+1$, mismatches and indels for $-1$.

R

source("https://bioconductor.org/biocLite.R")
biocLite("Biostrings")

library(Biostrings)
s1 <- "CCTGCTGCAG"
s2 <- "GATGTGCCG"
s3 <- "GATGTGCAG"
s4 <- "CCGCTAGCAG"
s5 <- "CCTGTAGG"

submatrix <- nucleotideSubstitutionMatrix(match = 1, mismatch = -1, baseOnly = TRUE)

pairwiseAlignment(s1, s2, substitutionMatrix = submatrix, gapOpening = 0,gapExtension = -1, scoreOnly = FALSE)
pairwiseAlignment(s1, s3, substitutionMatrix = submatrix, gapOpening = 0,gapExtension = -1, scoreOnly = FALSE)
pairwiseAlignment(s1, s4, substitutionMatrix = submatrix, gapOpening = 0,gapExtension = -1, scoreOnly = FALSE)
pairwiseAlignment(s1, s5, substitutionMatrix = submatrix, gapOpening = 0,gapExtension = -1, scoreOnly = FALSE)
pairwiseAlignment(s2, s3, substitutionMatrix = submatrix, gapOpening = 0,gapExtension = -1, scoreOnly = FALSE)
pairwiseAlignment(s2, s4, substitutionMatrix = submatrix, gapOpening = 0,gapExtension = -1, scoreOnly = FALSE)
pairwiseAlignment(s2, s5, substitutionMatrix = submatrix, gapOpening = 0,gapExtension = -1, scoreOnly = FALSE)
pairwiseAlignment(s3, s4, substitutionMatrix = submatrix, gapOpening = 0,gapExtension = -1, scoreOnly = FALSE)
pairwiseAlignment(s3, s5, substitutionMatrix = submatrix, gapOpening = 0,gapExtension = -1, scoreOnly = FALSE)
pairwiseAlignment(s4, s5, substitutionMatrix = submatrix, gapOpening = 0,gapExtension = -1, scoreOnly = FALSE)

## Problem 3 - CLUSTAL

Align group \begin{aligned} s_1 &= \mathtt{ATTGCCATT\_\_} \\ s_2 &= \mathtt{ATC\_CAATTTT} \end{aligned} with group \begin{aligned} s_1 &= \mathtt{ATGGCCATT} \\ s_2 &= \mathtt{ATCTTC\_TT} \end{aligned} using the approach of CLUSTALW algorithm. Align groups based on two most similar sequences considering matches for $+1$ and mismatches and gaps for $-1$.

The following code may help you to decide which two sequences will guide the alignment.

R

source("https://bioconductor.org/biocLite.R")
biocLite("Biostrings")

library(Biostrings)
s1 <- "ATTGCCATT"
s2 <- "ATCCAATTTT"
s3 <- "ATGGCCATT"
s4 <- "ATCTTCTT"

submatrix <- nucleotideSubstitutionMatrix(match = 1, mismatch = -1, baseOnly = TRUE)

pairwiseAlignment(s1, s3, substitutionMatrix = submatrix, gapOpening = 0,gapExtension = -1, scoreOnly = FALSE)
pairwiseAlignment(s1, s4, substitutionMatrix = submatrix, gapOpening = 0,gapExtension = -1, scoreOnly = FALSE)
pairwiseAlignment(s2, s3, substitutionMatrix = submatrix, gapOpening = 0,gapExtension = -1, scoreOnly = FALSE)
pairwiseAlignment(s2, s4, substitutionMatrix = submatrix, gapOpening = 0,gapExtension = -1, scoreOnly = FALSE)

## Problem 4 - BLAST

Use BLAST algorithm to find the local alignment of query sequence $$\mathtt{IHNWALN}$$ in database $$\mathtt{AFGIAAAHDWALNW}.$$ Use $k=3$, a threshold for high scoring words $T=20$, and BLOSUM 62 scoring matrix.

## Problem 5 - BLAST online

On the second tutorial, we assembled a sequence. If you did not obtain results in reasonable time, you could use this file. Use NCBI BLAST page to find what species it is. You do not need to search for the whole sequence; one contig is enough. 