Skip to content

Hiden Markov Models (HMM)

by A. Bretaudeau on July 28, 2010

The life of Andrey M.

Do you know Andrey Markov?

(source: wikimedia)

As written in Wikipedia, “Andrey Andreevich Markov was born in 1856 in Ryazan as the son of the secretary of the public forest management of Ryazan, Andrey Grigorevich Markov, and his first wife Nadezhda Petrovna Markova“.

Apart from having a beautiful son, he also gave birth to what is now known as Markov chains.

Ok, why am I talking about him here?

HMM: Hidden Markov models

For the moment, I have described 2 main ways to describe a motif: patterns and profiles. Hidden Markov models (HMM) is the third (and last) main method to represent a motif.

HMM are based on Markov chains as described by Mr Markov during his brilliant mathematician career. Briefly, they’re statistical models that allow to represent a motif based on the probability to find a given letter after another one. Let’s have a look at an example to try to understand this. If we have this set of aligned sequences containing a motif:

AATACT
GA-AGT
ATTAGA
GCTAGT

We can represent the corresponding motif with the following HMM:

How do you read that? It is not that much difficult: each cycle is a position in the motif. For each of these positions, the probability to find each letter of the alphabet is written (there is a probability of 0.25 to find a T in position 2).

You also get the probability assigned to the transition from a position to another: for example, in position 2, you have the choice to go to the position 3 or directly to position 4 (which occurs when there’s a gap).

The image above is one possibility to view a HMM, but you can also use more eye candy visualization like logos:

In logos, the bigger the letters are, the more conserved they are in the motif.  When no letter is shown, it simply means that no significant letter is expected at the corresponding position. If you want to create your own logos, you can use WebLogo service.

As noticed for profiles, this example is based on information provided by the observation of an alignment of sequences containing the motif. But you can also add more biological knowledge to this kind of motif, particularly when using proteic sequences: you just have to modify the probabilities to be more tolerant to some substitutions between related amino-acids.

Using HMM to describe motifs allows to use many algorithms written for similar problems related or not to bioinformatics. It is used for example for speech recognition, artificial intelligence or pattern analysis in non biological data.

In bioinformatics, using HMM motifs is mainly helpful for complex motifs. It brings the ability to find motifs occurrences in divergent sequences.

Conclusion on motif formats

Now that we’ve talked about HMM, you know about the 3 main motifs formats used in bioinformatics. Each format differs mainly by their expressivity and their performances. Choose PROSITE patterns if you want to discover or match simple patterns with low divergence.  If your motif is more complex, choose profiles or HMM.

One Comment

Trackbacks & Pingbacks

  1. The technical architecture | Dr Motifs – Blog

Leave a Reply

Note: XHTML is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS