Skip to content

Protomata 2 now available

by A. Bretaudeau on June 15, 2012


Today, I’m going to write about a software which is developed here at the Irisa/INRIA center (Rennes, France), in the Symbiose team: Protomata.


Protomata allows to perform pattern matching and discovery on protein sequences. In fact, Protomata is made out of 2 main components: Protomata-learner for discovery and Protomatch for matching.

The patterns used by the Protomata tools are in the form of automata, which are graphical models representing a set of sequences. It is specifically designed for heterogeneous sequence families. A typical automata produced by Protomata can look like this:

A simple protomata

In this example, the motif represent a family where some sequences share a common block (K|H, C, C) at the end of the sequence (starting around positions 58/64). And the arrows at the bottom mean that some of the sequences don’t contain the K|H,C,C block.

The thing to keep in mind when you use Protomata is that, from a set of related proteins, you will find some blocks that are conserved in all or in a subset of sequences. Sometimes, these blocks can be associated to a specific function of the proteins, a catalytic site for example.

Protomata-learner discovers this kind of motifs in sets of related protein sequences using a special kind of alignments: Partial Local Multiple Alignments (PLMA). The PLMA which was used to generate the automata displayed above looks like this:

Looking at it, you can see that the alignment was made with 5 sequences, 3 of them sharing a block near the end (in yellow), and the other 2 which do not contain a similar region.

Using it

Protomata is available online using a dedicated web interface: Protomata

There is an help page where you will find more information about it.

The typical usage case of this software suite is the following:

-you start with a set of related sequences that you put in the Protomata learner form

-you test different protomata-learner parameters following the instructions in the help page

-you can then scan other sequences (or a public databank) for similar sequences using Protomatch

Have fun with it!

From → Tools

No comments yet

Leave a Reply

Note: XHTML is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS