====== Tutorial  11 - Protein structure, the MODELLER software ======

===== Recap =====
Make sure you can answer the following questions:
  * Describe the levels of protein structure.
  * Explain in general the idea of the "Branch and Bound" method.
  * Explain the meaning of the words when used for genes: analog, homolog, paralog, ortholog and xenolog. 
  * What is a protein ligand?

===== Homology modeling - protein structure prediction exercise =====
A simple, although not always reliable, way to discover the secondary structure of a peptide sequence is to look up a protein with similar primary sequence in a database. Let us try this! The task is to obtain the secondary structure of the following peptide sequence: ''HYLCKYVINAIPPTLTAKIHFRPELPAERNQLIQRLA''
  - Go to [[https://blast.ncbi.nlm.nih.gov/Blast.cgi]] and click "Protein blast".
  - Enter the sequence and enter "Homo sapiens (taxid:9606)" as organism.
  - Click the blast button and wait. This may take up to several minutes. 
  - Look for the best matching protein. It should be: "monoamine oxidase A"
  - Enter this protein name to [[https://www.uniprot.org/uniprot/|UniProt]]. /********** pozor, az treti shoda v poradi ma tuspravnou delku a muze poslouzit k zarovnani **********/ 
  - Check whether the result has a secondary sequence annotation and find the position respective to the BLAST match.

Use the above-described procedure to learn most about the following peptide sequence: ''TEYAINKLRQLYVLRC''. 

<note tip> A hint: the sequence is a part of a frequent [[https://en.wikipedia.org/wiki/Protein_domain|protein domain]]. </note>

/**
 * It is a part of SH2 domain.
 *  SH2 doména (Src-homology 2 domain) je strukturní doména vyskytující se v různé míře u všech eukaryotických organismů; je typická tím, že se váže na fosforylovaný tyrosin (fosfotyrosin, pY). Je součástí celé řady především signálních bílkovin v buňce. Také je součástí Src onkogenu, který může způsobit rakovinné bujení.
 * It was taken from: https://www.pnas.org/doi/10.1073/pnas.011577898, Fig.1 (alphaA and betaB, the first two proteins JAK1 and JAK2 merged).
 * BLASP finds: Tyrosine-protein kinase JAK1, the total length 1154, match with positions 446-466.
 *  JAK1 patří mezi tyrozinkinázy, t.j. enzymy ze skupiny proteinkináz, které katalyzují přenos fosfátové skupiny (fosforylace) z nukleosidtrifosfátů (většinou ATP) na aminokyselinu tyrozin v proteinech. Fosforylace je nejčastější posttranslační modifikací proteinů a má důležitou funkci v regulaci mnoha buněčných signálních drah.
 * Uniprot Tyrosine-protein kinase JAK1 record:
 *  Molecule processing: check that the length is the same,
 *  Secondary structure: 446 ... helix starts, 463 ...bsheet starts,
 *  Domains and Repeats: 439 – 544 ... SH2.
 */
===== Threading exercise =====
Recall the branch-and-bound threading algorithm from the lecture. 

Suppose we have three segments (i, j, k), each of which includes three amino
acids. For a given sequence there are three possible starting positions for
each segment. (i ∈ {2,3,4}, j ∈ {8,9,10}, k ∈ {13,14,15})
We will be using the simple lower bound:
{{:courses:bin:tutorials:lb.png?400}}

Suppose that you are given the following values for the scores
of the individual segments and the scores for segment interactions:

^ i ^ j ^ k ^
|g1(i,2) = 5 | g1(j,8) = 9| g1(k,13) = 3|
|g1(i,3) = 2 | g1(j,9) = 7| g1(k,14) = 4|
|g1(i,4) = 8 | g1(j,10) = 6| g1(k,15) = 1|

^i/j ^ j/k ^ i/k ^
|g2(i,j,2,8) = 1|g2(j,k,8,13) = 7|g2(i,k,2,13) = 1|
|g2(i,j,2,9) = 2|g2(j,k,8,14) = 8|g2(i,k,2,14) = 2|
|g2(i,j,2,10) = 2|g2(j,k,8,15) = 7|g2(i,k,2,15) = 5|
|g2(i,j,3,8) = 5|g2(j,k,9,13) = 1|g2(i,k,3,13) = 5|
|g2(i,j,3,9) = 6|g2(j,k,9,14) = 6|g2(i,k,3,14) = 6|
|g2(i,j,3,10) = 4|g2(j,k,9,15) = 8|g2(i,k,3,15) = 4|
|g2(i,j,4,8) = 7|g2(j,k,10,13) = 11|g2(i,k,4,13) = 1|
|g2(i,j,4,9) = 3|g2(j,k,10,14) = 12|g2(i,k,4,14) = 2|
|g2(i,j,4,10) = 4|g2(j,k,10,15) = 13|g2(i,k,4,15) = 4|

Using this information, **compute the optimal threading**.

=====  MODELLER overview =====

The purpose of this tutorial is to put our hands on a software for comparative protein structure modelling, namely the MODELLER.  We will go through a basic tutorial for this software. At the end this tutorial, you should have a better understanding of what such software is capable of. 

===== Installation =====

Download the MODELLER version for your operating system from: [[https://salilab.org/modeller/download_installation.html]] (Note: It is also available from official repositories of some GNU/Linux distributions.) 

You need to register yourself in order to obtain a license here: [[https://salilab.org/modeller/registration.html]] You should provide your university e-mail in the registration form.

===== The tutorial =====

We will follow this tutorial from the MODELLER webpages: [[https://salilab.org/modeller/tutorial/basic.html]].

For those who are interested to learn more about MODELLER, there are also advanced tutorials: [[https://salilab.org/modeller/tutorial/]]


===== See also =====
If you are interested, you may also have a look at the [[https://zhanglab.ccmb.med.umich.edu/I-TASSER/|I-TASSER]] software.

A [[http://www.bpc.uni-frankfurt.de/guentert/wiki/images/b/b1/180625_Tutorial_Modelling.pdf|tutorial]] on homology modelling from the university of Frankfurt.
===== References =====
Branch and bound threading example taken from [[https://www.biostat.wisc.edu/bmi776/spring-17/lectures/threading.pdf]] {{ :courses:bin:tutorials:threading.pdf |}}