Protomata 2 now available
Hello,
Today, I’m going to write about a software which is developed here at the Irisa/INRIA center (Rennes, France), in the Symbiose team: Protomata.
Protomata
Protomata allows to perform pattern matching and discovery on protein sequences. In fact, Protomata is made out of 2 main components: Protomata-learner for discovery and Protomatch for matching.
The patterns used by the Protomata tools are in the form of automata, which are graphical models representing a set of sequences. It is specifically designed for heterogeneous sequence families. A typical automata produced by Protomata can look like this:
In this example, the motif represent a family where some sequences share a common block (K|H, C, C) at the end of the sequence (starting around positions 58/64). And the arrows at the bottom mean that some of the sequences don’t contain the K|H,C,C block.
The thing to keep in mind when you use Protomata is that, from a set of related proteins, you will find some blocks that are conserved in all or in a subset of sequences. Sometimes, these blocks can be associated to a specific function of the proteins, a catalytic site for example.
Protomata-learner discovers this kind of motifs in sets of related protein sequences using a special kind of alignments: Partial Local Multiple Alignments (PLMA). The PLMA which was used to generate the automata displayed above looks like this:
Looking at it, you can see that the alignment was made with 5 sequences, 3 of them sharing a block near the end (in yellow), and the other 2 which do not contain a similar region.
Using it
Protomata is available online using a dedicated web interface: Protomata
There is an help page where you will find more information about it.
The typical usage case of this software suite is the following:
-you start with a set of related sequences that you put in the Protomata learner form
-you test different protomata-learner parameters following the instructions in the help page
-you can then scan other sequences (or a public databank) for similar sequences using Protomatch
Have fun with it!