Skip to content

Sequence hammering

by A. Bretaudeau on October 15, 2010

Hi all,

As promised in my last post, I’m going to show you what are the Hmmer tools and how you can use them for your sequence analysis.

First, you have to know that Hmmer is a collection of tools dedicated to the manipulation of Hidden Markov Models. So it is particularly useful for studying complex motifs with subtle signals. And it is designed to work with protein sequences.

Hmmer 3 is a suite of 12 tools (hmmalign, hmmbuild, hmmconvert, hmmemit, hmmfetch, hmmpress, hmmscan, hmmsearch, hmmsim, hmmstat, jackhmmer and phmmer). But don’t be afraid, with only 5 of them, you can already do great things! And there’s a good documentation for each tool on the official website.

The most common things you can do are summarized in the following figure:

Hmmer usage

So let’s have a look at the most important tools.

From sequences to HMM: hmmbuild

Let’s suppose you have a set of sequences and you think that they contain a common motif. With hmmer, you can build an HMM representing this motif. To do this, there are basically two steps:

  1. First create a multiple alignment of your sequences using a program like clustalw (there are other program capable of doing this task, but this is not really the topic of this post);
  2. Give this multiple alignment to the hmmbuild tool included.

Several alignment formats can be read by hmmbuild: aln (clustalw output format), or Stockholm for example.

The output of hmmbuild is a text file representing the found HMM.

How to use it?

Hmmbuild is available on our platform:

Mobyle web interface

SOAP webservice on our Opal server

Alternatively, the whole hmmer suite is available from command line on genocluster2: just issue the command “source /local/env/envhmmer-3.0” (or “. /local/env/envhmmer-3.0.sh” if you use bash) and you’ll get access to the 12 hmmer tools.

Blastp-like search: phmmer and jackhmmer

The first rudimentary version of Jackhmmer

Hmmer comes with two other useful tools: phmmer and jackhmmer. They do the same kind of analysis as blastp and psi-blast respectively.

Phmmer takes a protein sequence as input and search for similar sequences in a protein databank (NR for example).

Jackhmmer do the same work as phmmer, except it repeats it iteratively. This means that, as psi-blast do, it launches phmmer a first time, then look at the results, select the best matches to the query sequences, build a new HMM from it and search again into the databank for new similar sequences. You can set the maximum number of iterations to be done (and if you set it to 1, it will do exactly as a normal phmmer search).

You may wonder why you should use phmmer/jackhmmer (or not) instead of the traditional blastp/psi-blast? Performances is the answer: it seems that phmmer runs a bit faster than the good old blastp (well, I’ve only done a quick test, check it with your data). But keep in mind that it only works for proteic sequences. And you’re not guaranteed to get the same results as with blastp (scores are not identical). So try it and see if you’re happy with it!

How to use it?

Both tools are available on our platform:

Phmmer on Mobyle and Jackhmmer on Mobyle

Phmmer SOAP webservice and Jackhmmer SOAP webservice

Alternatively, the whole hmmer suite is available from command line on genocluster2: just issue the command “source /local/env/envhmmer-3.0” (or “. /local/env/envhmmer-3.0.sh” if you use bash) and you’ll get access to the 12 hmmer tools.

Searching some known HMM in a new sequence: hmmscan

Do you remember how InterProScan works? Hmmscan is quite similar: it takes as input a fasta sequence, and it searches in it any occurrences of HMM registered in specific databank. The main HMM databank used by hmmscan is Pfam. It is a databank of HMM representing protein families. In fact there are two sections in Pfam: Pfam-A which is a manually curated collection of protein families. Pfam-B is a bit lower quality as it contains families automatically generated. So the usual process is to search first using Pfam-A, and if you don’t get results, search using Pfam-B.

InterProScan is doing the same king of thing, but it uses several tools to search within several databanks. In fact, when you use InterProScan, you already use hmmer. In the list of programs used by InterProScan, there is hmmpfam, which is the ancestor of hmmscan (in hmmer 2).

How to use it?

Hmmscan is available on our platform:

Hmmscan on Mobyle

Hmmscan SOAP webservice

Alternatively, the whole hmmer suite is available from command line on genocluster2: just issue the command “source /local/env/envhmmer-3.0” (or “. /local/env/envhmmer-3.0.sh” if you use bash) and you’ll get access to the 12 hmmer tools.

Searching for sequences using a HMM: hmmsearch

Hmmsearch does the opposite of hmmscan: you start from a HMM and then you search into sequence databanks for sequences containing the HMM your interested in.

Using it is quite simple: just give a HMM and select a sequence databank (a proteic one) to search in. Instead of the sequence databank, you can also give a fasta file containing some specific sequences.

The result is a list of matches with corresponding scores.

How to use it?

Hmmsearch is available on our platform:

Hmmsearch on Mobyle

Hmmsearch SOAP webservice

Alternatively, the whole hmmer suite is available from command line on genocluster2: just issue the command “source /local/env/envhmmer-3.0” (or “. /local/env/envhmoer-3/0&sh” if you use bash) and you’ll get access to the 12 hmmer tools.

HMM retrieving: hmmfetch

Hmmfetch is another tool which lets you retrieve a HMM from a databank (Pfam for example). You just have to give a list of HMM identifiers and hmmfetch will give you the whole HMM file. Identifiers can be for example PF00045 or Caudal_act.

How to use it?

Hmmfetch is available on our platform:

Hmmfetch on Mobyle

Hmmfetch SOAP webservice

Alternatively, the whole hmmer suite is available from command line on genocluster2: just issue the command “source /local/env/envhmmer-3.0” (or “. /local/env/envhmmer-3.0.sh” if you use bash) and you’ll get access to the 12 hmmer tools.

Other tools

Other hmmer tools are also available on the platform (web interfaces on Mobyle and SOAP webservices on Opal), though there are not very useful for most of the analysis.

That’s it for today!

From → How-tos

No comments yet

Leave a Reply

Note: XHTML is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS