HMMER

This tutorial follows the HMMER User’s Guide written by Sean R. Eddy, Travis J. Wheeler and the HMMER development team. Thanks.

Problem 1 - Install HMMER

First of all, we download HMMERv3.1b2 from http://hmmer.org and unpack it.

bash

wget http://eddylab.org/software/hmmer3/3.1b2/hmmer-3.1b2.tar.gz
tar xfv hmmer-3.1b2.tar.gz
cd hmmer-3.1b2
Now we have downloaded the HMMER source code. Now, it's time to compile it.

bash

./configure
make
make check
Probably, we don't have root privileges so add binaries to the PATH variable.

bash

export PATH=/path/to/hmmer-3.1b2/src:$PATH

The programs in HMMER

- Build models and align sequences (DNA or protein)

  1. hmmbuild - Build a profile HMM from an input multiple alignment.
  2. hmmalign - Make a multiple alignment of many sequences to a common profile HMM

- Search protein queries against protein database

  1. phmmer - Search a single protein sequence against a protein sequence database. (BLASTP-like)
  2. jackhmmer - Iteratively search a protein sequence against a protein sequence database. (PSIBLAST-like)
  3. hmmsearch - Search a protein profile HMM against a protein sequence database.
  4. hmmscan - Search a protein sequence against a protein profile HMM database.
  5. hmmpgmd - Search daemon used for hmmer.org website.

And many others….

Searching a protein sequence database with a single protein profile HMM

Step 1: build a profile HMM with hmmbuild

Common use of HMMER is to search a sequence database for a protein family of interest (homologues).

Look at the file tutorial/globins4.sto

What can you see there?

Construct a HMM from an alignment

bash

hmmbuild globins4.hmm tutorial/globins4.sto
Look at the globins4.hmm. Is it what you expected? Now, you have a sequence database to search.

Step 2: search the sequence database with hmmsearch

Run your example search against tutorial/globins45.fa.

bash

hmmsearch globins4.hmm tutorial/globins45.fa > globins4.out

Discuss the results!

Single sequence protein queries using phmmer

As BLASTP or FASTA, the phmmer is for searching a single sequence query against a sequence database.

bash

phmmer tutorial/HBB_HUMAN tutorial/globins45.fa > phmmer.out
Everything about the output is essentially as for hmmsearch .

Searching a profile HMM database with a query sequence

The HMM database can be Pfam, SMART, or TIGRFams, or another database of your choice.

Step 1: create an HMM database flatfile

A flatfile is just a concatenation of individual HMM files. Given this, we firstly build individual hmm files using hmmbuild and concatenate them using cat. Let's create a small database from the files of tutorial dir.

bash

hmmbuild globins4.hmm tutorial/globins4.sto
hmmbuild fn3.hmm tutorial/fn3.sto
hmmbuild Pkinase.hmm tutorial/Pkinase.sto
cat globins4.hmm fn3.hmm Pkinase.hmm > minifam
In this case, the minifan is our new hmm database. Because of accelaration, compress and index the flatfile with hmmpress.

bash

hmmpress minifam
See new four binary files in the dir.

Step2: search the HMM database with hmmscan

Now we can analyze sequences using our HMM database and hmmscan.

bash

hmmscan minifam tutorial/7LESS_DROME

TASK 1: find records in Pfam database

In the previous example, we used three records in the stockhold format: globins4.sto, fn3.sto, and Pkinase.sto. However, What can we do wheather we do not have any source records?

Which database to use?

Your task is following:

  1. Choose a profile HMMs database.
  2. Find three protein families: globin, fn3, and Pkinase. Download their alignments as a seed file (contains representative members of the family which are judged to be well aligned) in the stockholm format.
  3. Construct the HMM database (see the previous example).
  4. Analyse tutorial/7LESS_DROME using our HMM database.
courses/bin/tutorials/hmmer.txt · Last modified: 2019/04/11 12:32 by rysavpe1