Skip to content
Jun 15 12

Protomata 2 now available

by A. Bretaudeau

Hello,

Today, I’m going to write about a software which is developed here at the Irisa/INRIA center (Rennes, France), in the Symbiose team: Protomata.

Protomata

Protomata allows to perform pattern matching and discovery on protein sequences. In fact, Protomata is made out of 2 main components: Protomata-learner for discovery and Protomatch for matching.

The patterns used by the Protomata tools are in the form of automata, which are graphical models representing a set of sequences. It is specifically designed for heterogeneous sequence families. A typical automata produced by Protomata can look like this:

A simple protomata

In this example, the motif represent a family where some sequences share a common block (K|H, C, C) at the end of the sequence (starting around positions 58/64). And the arrows at the bottom mean that some of the sequences don’t contain the K|H,C,C block.

The thing to keep in mind when you use Protomata is that, from a set of related proteins, you will find some blocks that are conserved in all or in a subset of sequences. Sometimes, these blocks can be associated to a specific function of the proteins, a catalytic site for example.

Protomata-learner discovers this kind of motifs in sets of related protein sequences using a special kind of alignments: Partial Local Multiple Alignments (PLMA). The PLMA which was used to generate the automata displayed above looks like this:

Looking at it, you can see that the alignment was made with 5 sequences, 3 of them sharing a block near the end (in yellow), and the other 2 which do not contain a similar region.

Using it

Protomata is available online using a dedicated web interface: Protomata

There is an help page where you will find more information about it.

The typical usage case of this software suite is the following:

-you start with a set of related sequences that you put in the Protomata learner form

-you test different protomata-learner parameters following the instructions in the help page

-you can then scan other sequences (or a public databank) for similar sequences using Protomatch

Have fun with it!

May 31 12

DELTA-BLAST

by A. Bretaudeau

Hi!

A new Blast version (2.2.26+) has been released by the NCBI a few weeks ago. You can now use it using our blast web interface, or directly on our cluster.

I have also updated the Symfony bundle (GenouestBlastBundle) to work with this new version, so if you are using it and willing to update to this new blast version, don’t forget to update the code.

What’s new

As usual, the blast+ changelog is available online.

Apart from a few bug fixes and improvements, the main changes concern the psi-blast output which is now available in all formats (txt, html, tabular, asn, …) and a new application: DELTA-BLAST.

DELTA-BLAST

DELTA-BLAST (Domain Enhanced Look-up Time Accelerated BLAST) is a new application to perform protein-protein queries with better sensitivity.

There is an article describing how it works.

Briefly, DETA-BLAST performs a multiple sequence alignment of the query sequence with domains described in the CDD (Conserved Domain Database from NCBI) database and then uses a PSSM derived from this alignment to search a sequence database.

The difference with PSI-BLAST is that PSI-BLAST uses the results of a first blastp iteration to construct a PSSM and then uses it to search the sequence database. DELTA-BLAST uses PSSM derived from the CDD database, so the initial PSSM construction is much more quicker than PSI-BLAST.

The results in the paper look promising. You can try it if you want to find sequences distantly related to your query sequence.

Feb 21 12

MEME update: 4.8.1

by A. Bretaudeau

Hi all,

I just updated our MEME server to the latest version (4.8.1).

This new version comes with a new tool available on the web interface: CentriMo. It is dedicated to the analysis of ChIP-Seq data.

Havefun with it!

Dec 6 11

MEME 4.7.0 update

by A. Bretaudeau

Hello,

I just updated our MEME server to the latest version: 4.7.0.

As usual, the web interface is available at the same address: http://tools.genouest.org/tools/meme/.

The main change in this version is the new DREME web service and web interface. This tool was already used by the MEME-ChIP tool that was introduced in the MEME suite 4.6. It allows to discover motifs in sets of short (~100bp) sequences, like in ChIP dataset for example. If you want to learn how to use this tool, read the DREME tutorial which is online now.

Have a look at the release notes to see the whole changelog.

That’s all for today!

Sep 14 11

Blast+ web interface: source code available

by A. Bretaudeau

Hi there!Blast interface

It’s been a while, but today I come with some good news!

As you may remember, I have developed a web interface for Blast+. Some of you asked me if I could publish the source code. So today I’m releasing it to the world!

Now, you can download an almost ready-to-use web interface and install it on your web server to make it available to your users. I tried to make it as flexible as possible, so you can adapt it to your needs.

Requirements

To install this web interface, you’ll need a web server (Apache for example) with PHP >= 5.3.2 and a SQL database.

For better performances, you also need a cluster with SGE job scheduler. Computing nodes should run on linux.

The code is based on the new Symfony 2 framework, and it is available as “bundles” (which are a sort of Symfony plugins).

The installation requires some PHP skills, but it shouldn’t be too hard if you follow the instructions below.

Installation

Getting the code

The first step is to get the Symfony 2 code. Go to the official download page and get the latest “Symfony Standard (.tgz)”. Extract it somewhere on your server and open a terminal in the symfony directory.

Now you need to install some bundles. All our code is available on our github account.

First install the GenouestBioinfoBundle following the instructions in the installation section of the corresponding documentation.

You also need to install and configure GenouestSchedulerBundle (doc) and GenouestBlastBundle (doc). Follow the installation and configuration sections in the corresponding documentations.

Additionally, if you have a Biomaj server and you want to use it within the blast interface, install GenouestBiomajBundle (doc). If you have no idea what is a Biomaj server, just skip this optional step!

Preparing the database

Before testing your installation, you need to prepare the database that will store informations about each blast job that will be launched using the web interface.

First, create an empty database on your SQL server. Then configure the connection of your blast interface. Briefly, open the ‘app/config/parameters.ini’ file and fill the different connection parameters (database driver, hostname, user, password, database name).

Now we need to create the SQL tables in the database. To do so, just launch the following command from the symfony root directory:

php app/console doctrine:schema:update --force

And that’s it! The database is ready to be used.

Getting the web interface online

To test your application, just make sure that the ‘web’ directory is accessible from the internet. You can have a look at the Symfony documentation for more help. Personally, I prefer to install symfony in any directory, and then create a symbolic link in the apache www directory pointing to the Symfony “web/” dir.

Suppose your Symfony is installed in /opt/myblastapp/ and the apache root directory is /apache/www/. You can create a symbolic link like this:

ln -s /opt/myblastapp/ /apache/www/blast

And now access your application using http://example.org/blast/

Getting help & contributing

As you see, there are some manipulations to install the web interface on your server, but it shouldn’t too hard to do if you have some PHP skills. As it is based on the Symfony 2 framework, it is really customizable. In case of problem, the Symfony documentation can help you: it is well written and covers most of the things you can do with this framework.

The code is released under the French CeCILL license which is a GPL-like license. Don’t hesitate to submit bug or patches to our github repositories. Any comments are welcome!

Apr 5 11

Blast+ and MEME updates

by A. Bretaudeau

Hi all,

Just a quick post to tell you that I have updated our servers to the latest Blast+ and MEME versions.

Blast+ 2.2.25+

The NCBI has released a few days ago the Blast+ 2.2.25+ version. I was particularly impatient to get it as there was a bug in 2.2.24+ version which caused some results to be incomplete (you may have seen the warning message about that on our form).

So this new version fixes this specific bug (and others), and brings some improvements which you can see in the changelog.

Feel free to test our blast form and tell us if you have any problem with it.

Speaking about this form, to answer a comment from one of my previous post: we’re not really planning to release the code right now. It is based on the symfony framework, using one of our plugins to submit jobs to our cluster (sfobManagerPlugin). We will probably port this code to the new Symfony2 architecture soon, so maybe one day we will release a BlastBundle for it?

MEME 4.6.1

MEME has also been updated to the brand new 4.6.1 version. As usual reading the release notes will tell you what’s new.

It concerns mostly MEME-ChIP which is now available from the command line. It may be useful if you want to test it on our cluster: see this post if you don’t know how to use it from command line.

Test it, and tell us if you have any problem with it!

Bye

Feb 3 11

MEME 4.6: MEME-ChIP and Spamo

by A. Bretaudeau

As promised in one of my last post, I’ve just finished updating our MEME server to the fresh version 4.6.0.

The main new features of this release is the addition of two new applications in the suite: MEME-ChIP and Spamo.

MEME-ChIP

MEME-ChIP logoAs you may guess, MEME-ChIP is dedicated to… ChIP-Seq experiments!

This tool is in fact a meta-tool that launch several analysis on a set of sequences. The good news is that the official website has a great tutorial explaining how it works.

Briefly, the input data is a fasta file containing many sequences generated by ChIP-seq (or other technology producing the same kind of sequences). The first step is to find motifs in these sequences: two tools are launched in parallel: MEME and DREME. MEME is good for finding wider motifs than DREME. DREME is designed for shorter one.

Once it has found a lot of motifs (hopefully), the next step is to compare them to public databanks of motifs, like Jaspar for example. This is done using TOMTOM.

MEME-ChIP then launches a MAST search to find each motif site in the sequences you submitted. Finally AMA and AME are used to estimate the binding affinity of input sequences to each motif, and to find subtly enriched known binding motifs in your input sequences.

So a new tool specialized in ChIP-seq data analysis.

Spamo

Spamo logo
Spamo is also a tool particularly useful for ChIP-seq (though it can work with other data).

You give it a set of sequences (typically ChIP-seq sequences) and a motif that is represented in some of these sequences. The third thing to specify is a databank of motifs like Jaspar or Uniprobe.

Spamo searches for all motifs of the given databank near the motif sites in your sequences.

So this tool can help you determine the presence of a known motif at a specific position near another one. Useful when studying transcription factor binding sites.

It seems Spamo is still in a beta version, but I didn’t have any bug with it.

That’s it with the 4.6 MEME release!

Feb 2 11

New BLAST+ web form

by A. Bretaudeau

Hi all,

Today, let’s talk about the famous BLAST and its successor BLAST+! BLAST+ is available on the platform.

At the beginning…

…there was BLAST. It was published in 1990 and it is one of the most used bioinformatics tools. Many web form has been created around the world, the main ones being at the NCBI or at the EBI. In short, if you don’t already know what is BLAST: it compares sequences and allow you to find sequences that are similar to a given one.

The main BLAST implementation comes from the NCBI, although other implementations were also released (WU-BLAST which was later renamed AB-BLAST and is not free, FSA-BLAST, …).

BLAST comes in many flavours (blastn, blastp, blastx, tblastn, tblastx, psiblast, phiblast, megablast, …) which mainly differ in the type of sequences that are compared.

BLAST+

At the end of 2009, the NCBI published a complete rewrite of their BLAST: it is now called BLAST+. Their aim was to provide a faster implementation, easier to use, and providing comparable results.

If you’re not using the command line, the main change you can see is the NCBI web interface that was completely revamped a few months ago.

At the command line level, there were many changes. In fact they renamed all the binaries and options. The following picture perfectly illustrates this:

At the top, the binaries of BLAST. At the bottom, their equivalent with BLAST+.

I think they decided that it was time to change all the names once and for all. And I think they’re right: it’s much more usable now. For compatibility, there is a perl script that you can use to translate an old command line into the BLAST+ format.

If you look at the publication, you’ll see the performance improvements.

BLAST+ at GenOuest

On the platform, we have installed BLAST+ (but the legacy BLAST is still available of course). To use it with command line, just source it like this: “source /local/env/envblast+”, and then you can play with blastn, blastp and all their friends.

If you prefer a web interface, we have created a new one using BLAST+. It is largely inspired from the NCBI form, the main differences being the available databanks, and the dedicated resources.

We hope you will like it. It is already much better than our previous form. We should improve it with a surprise in a few weeks.

Tell us if you have any problem with this new form!

Jan 28 11

MEME overview

by A. Bretaudeau

It’s been too long since my last post, but finally, I’m back! I will present you the work I have done under the hood soon. But for now, let’s talk about Meme!

Meme is a suite of tools for pattern matching, pattern discovery, and other pattern manipulations. It works with PSSM, and it is particularly designed for nucleic sequences, although some of the tools works with protein sequences too.

Meme is installed on our platform, and it is available with a great web interface. It is also possible to us it with command line (source /local/env/envmeme), and with webservices (as usual with our Opal server).

I am going to present you the most useful tools of this suite. You should also visit the MEME documentation which is quite helpful.

MEME and MAST

MEME

The MEME tool (which gave its name to the whole suite) is dedicated to pattern discovery. It simply takes a set of sequences (protein or nucleic acid) and search for some pattern represented in some or all the sequences.

You can specify the number of motifs to find, the minimum and maximum length of the motif(s), and if the motif(s) is present in all or some of your sequences.

To use it, just go to this page!

The results are in HTML format. They contain a list of the motifs found by the program. Each one is represented with a nice logo and is given a score (low score means high quality motif):

MEME results: logo

An example motif found by MEME. The left one is on forwar strand, the right one is the same on reverse strand.

Just after these logos, there are 4 buttons which allow you to launch further analysis using the MEME results: MAST, FIMO, GOMO and BLOCKS are available.

After this, you can see the motif sites found in the sequences you gave to MEME:

MEME: motifs sites in given sequences
A list of sites matching the motif, found in each sequence

Of course you can download and view your motif in difference format: PSSM or PROSITE-like pattern, the latter being less expressive than the first one.

MAST

When you have found a motif in a set of sequences, the next thing you might want to do is to search for other sequences containing this motif, i.e. pattern matching. The MEME suite comes with a tool dedicated to this: MAST.

You have the choice to directly launch a MAST search from a MEME result page, or to save a MEME motif and then upload it on the MAST form:

Here, we launched MAST directly from MEME result.

On the MAST form, you only have to choose a motif, and a sequence database to look into -some bacteria genome for example). There is also an option very similar to the blast e-value threshold parameter: MAST gives a score to each hit it finds in the database, and it only show you hits having an e-value lower than a given threshold (10 by default).

After launching the search, you get a representation of the search sequence(s), each found hit being highlighted. If you move your mouse hover each hit, MAST gives you the associated e-value and the position in the sequence.

A sequence with a lot of hits.

GLAM2 and GLAM2SCAN

GLAM2 does the same job as MEME (pattern discovery) except that it can discover motifs containing gaps. And GLAM2SCAN is the equivalent of MAST, with gap support.

There is not much more to say about them: they work very similarly to MEME and MAST. GLAM2 has a few more options, in particular for the tuning of insertion and deletion costs.

TOMTOM

The last main tool offered in the MEME suite, is TOMTOM. It aims to compare a given motif to databanks of publicly available motifs. Let’s suppose you have found a new motif with MEME or GLAM2, you might want to know if this motif has already been described by someone else in the world. TOMTOM will give you the answer.

On the web form, you have to enter the motif you have found, and then select a databank where to look for similar motifs. And there is also an e-value threshold working the same way as MAST, GLAM2SCAN or blast.

Several motif formats are allowed: PROSITE-like (IUPAC), PSSM, or MEME txt file

You can search in various databanks, especially JASPAR or TRANSFAC which are big databanks of transcription factor binding sites. There are other, usually smaller, databanks specific to some organisms (Drosophila for example) or domains. As I said at the beginning, the MEME suite is mainly aimed at nucleic sequences, so the databanks available in TOMTOM are nucleic ones, and more specifically banks of transcription factor binding sites.

Other tools

So we have seen MEME, MAST, GLAM2, GLAM2SCAN and TOMTOM in action. But MEME comes with some other tools:

MCAST allows to search for cluster of motif sites in a given sequence. This can be helpful when you’re studying regulatory modules.

GOMO is a pattern matching program which can help you to assign a function to a motif: you give it a motif, and it searches for genes located close to occurrences of this motif. After this, it automatically retrieves the GO terms keywords associated to these genes. So you get a list of GO annotations related to the motif you entered (if you don’t know GO terms, have a look at this page!).

FIMO is very similar to MAST: when you give 3 motifs to MAST, it will search for sequences having at least these 3 motifs. With FIMO, you will get sequences containing at least one of the motifs you gave.

There are other smaller utilities only available with command line. Take a look at the documentation to see if they can help you.

MEME 4.6

While writing this article, I found out MEME 4.6 has been released. It comes with two new tools (MEME-Chip and Spamo) which I haven’t tested yet. I’m going to update our MEME server in the next few days and I will probably write a new blog post about these new tools.

Stay tuned!

Dec 15 10

New website!

by A. Bretaudeau

Hi all!

Great news

Today we’re launching a new website! It’s available at www.drmotifs.org and it’s the new face of Dr Motifs.

The new Dr Motifs website

The new Dr Motifs website

The aim of this new website is to have a new entry point to Dr Motifs,with more stable information and (we hope!) a friendlier interface.

The homepage is focused on the two main analysis you want to do with motifs: discovery and matching. Each one has a dedicated page (which is still incomplete, but will get better in the near future).

One page I’d like to point you to is the tools page. It contains a full list of all the motif analysis tools installed on our platform, with a synthetic view of their specificities (motif type, nucleic or proteic, etc). And of course a link to each web interface, and to related blog posts.

I hope you will like this new website. Don’t hesitate to tell me if you have problems with it!

The blog

Of course, this blog will continue to live! In fact the website and the blog are complementary: stable, synthetic information on the website, and fresh, dynamic content for the blog.