====== Tutorial 11 - Protein structure, the MODELLER software ====== ===== Recap ===== Make sure you can answer the following questions: * Describe the levels of protein structure. * Explain in general the idea of the "Branch and Bound" method. * Explain the meaning of the words when used for genes: analog, homolog, paralog, ortholog and xenolog. /* analog - vznikne konvergentnim vyvojem z nepribuznych predku, tj. homoplazicky */ * What is a protein ligand? /* typicky malá molekula, která vytváří komplex s biomolekulou, typicky proteinem; často signální molekula, která se váže na vazebné místo cílového proteinu pomocí slabých molekulových interakcí, díky tomu je vazba ligandů většinou reverzibilní, vazba ligandu na receptorový protein většinou mění jeho konformaci a tím určuje biologickou funkci proteinu */ ===== Homology modeling - protein structure prediction exercise ===== A simple, although not always reliable, way to discover the secondary structure of a peptide sequence is to look up a protein with similar primary sequence in a database. Let us try this! The task is to obtain the secondary structure of the following peptide sequence: ''HYLCKYVINAIPPTLTAKIHFRPELPAERNQLIQRLA'' - Go to [[https://blast.ncbi.nlm.nih.gov/Blast.cgi]] and click "Protein blast". - Enter the sequence and enter "Homo sapiens (taxid:9606)" as organism. - Click the blast button and wait. This may take up to several minutes. - Look for the best matching protein. It should be: "monoamine oxidase A" - Enter this protein name to [[https://www.uniprot.org/uniprot/|UniProt]]. /********** pozor, az treti shoda v poradi ma tu spravnou delku a muze poslouzit k zarovnani **********/ - Check whether the result has a secondary sequence annotation and find the position respective to the BLAST match. Use the above-described procedure to learn most about the following peptide sequence: ''TEYAINKLRQLYVLRC''. A hint: the sequence is a part of a frequent [[https://en.wikipedia.org/wiki/Protein_domain|protein domain]]. /** * It is a part of SH2 domain. * SH2 doména (Src-homology 2 domain) je strukturní doména vyskytující se v různé míře u všech eukaryotických organismů; je typická tím, že se váže na fosforylovaný tyrosin (fosfotyrosin, pY). Je součástí celé řady především signálních bílkovin v buňce. Také je součástí Src onkogenu, který může způsobit rakovinné bujení. * It was taken from: https://www.pnas.org/doi/10.1073/pnas.011577898, Fig.1 (alphaA and betaB, the first two proteins JAK1 and JAK2 merged). * BLASP finds: Tyrosine-protein kinase JAK1, the total length 1154, match with positions 446-466. * JAK1 patří mezi tyrozinkinázy, t.j. enzymy ze skupiny proteinkináz, které katalyzují přenos fosfátové skupiny (fosforylace) z nukleosidtrifosfátů (většinou ATP) na aminokyselinu tyrozin v proteinech. Fosforylace je nejčastější posttranslační modifikací proteinů a má důležitou funkci v regulaci mnoha buněčných signálních drah. * Uniprot Tyrosine-protein kinase JAK1 record: * Molecule processing: check that the length is the same, * Secondary structure: 446 ... helix starts, 463 ...bsheet starts, * Domains and Repeats: 439 – 544 ... SH2. */ ===== Threading exercise ===== Recall the branch-and-bound threading algorithm from the lecture. Suppose we have three segments (i, j, k), each of which includes three amino acids. For a given sequence there are three possible starting positions for each segment. (i ∈ {2,3,4}, j ∈ {8,9,10}, k ∈ {13,14,15}) We will be using the simple lower bound: {{:courses:bin:tutorials:lb.png?400}} Suppose that you are given the following values for the scores of the individual segments and the scores for segment interactions: ^ i ^ j ^ k ^ |g1(i,2) = 5 | g1(j,8) = 9| g1(k,13) = 3| |g1(i,3) = 2 | g1(j,9) = 7| g1(k,14) = 4| |g1(i,4) = 8 | g1(j,10) = 6| g1(k,15) = 1| ^i/j ^ j/k ^ i/k ^ |g2(i,j,2,8) = 1|g2(j,k,8,13) = 7|g2(i,k,2,13) = 1| |g2(i,j,2,9) = 2|g2(j,k,8,14) = 8|g2(i,k,2,14) = 2| |g2(i,j,2,10) = 2|g2(j,k,8,15) = 7|g2(i,k,2,15) = 5| |g2(i,j,3,8) = 5|g2(j,k,9,13) = 1|g2(i,k,3,13) = 5| |g2(i,j,3,9) = 6|g2(j,k,9,14) = 6|g2(i,k,3,14) = 6| |g2(i,j,3,10) = 4|g2(j,k,9,15) = 8|g2(i,k,3,15) = 4| |g2(i,j,4,8) = 7|g2(j,k,10,13) = 11|g2(i,k,4,13) = 1| |g2(i,j,4,9) = 3|g2(j,k,10,14) = 12|g2(i,k,4,14) = 2| |g2(i,j,4,10) = 4|g2(j,k,10,15) = 13|g2(i,k,4,15) = 4| Using this information, **compute the optimal threading**. ===== MODELLER overview ===== The purpose of this tutorial is to put our hands on a software for comparative protein structure modelling, namely the MODELLER. We will go through a basic tutorial for this software. At the end this tutorial, you should have a better understanding of what such software is capable of. ===== Installation ===== Download the MODELLER version for your operating system from: [[https://salilab.org/modeller/download_installation.html]] (Note: It is also available from official repositories of some GNU/Linux distributions.) You need to register yourself in order to obtain a license here: [[https://salilab.org/modeller/registration.html]] You should provide your university e-mail in the registration form. ===== The tutorial ===== We will follow this tutorial from the MODELLER webpages: [[https://salilab.org/modeller/tutorial/basic.html]]. For those who are interested to learn more about MODELLER, there are also advanced tutorials: [[https://salilab.org/modeller/tutorial/]] ===== See also ===== If you are interested, you may also have a look at the [[https://zhanglab.ccmb.med.umich.edu/I-TASSER/|I-TASSER]] software. A [[http://www.bpc.uni-frankfurt.de/guentert/wiki/images/b/b1/180625_Tutorial_Modelling.pdf|tutorial]] on homology modelling from the university of Frankfurt. ===== References ===== Branch and bound threading example taken from [[https://www.biostat.wisc.edu/bmi776/spring-17/lectures/threading.pdf]] {{ :courses:bin:tutorials:threading.pdf |}}