KeyRegulatorFinder is a program that allows searching key regulators of lists of molecules (like metabolites, enzymes or genes) by taking advantage of knowledge databases in cell metabolism and signaling. To use this tool, you must have a valid Transpath® license. Up to now, this program is distributed as an all-in-one virtual machine, but we are working on a stand alone Transpath-free version. The virtual machine version works only on 64bits computers, with full virtualization support.

Install the required software

Download the required files

Install

Launch

Troubleshooting

Prepare your lists of molecules (genes or metabolites)

Format

A list of molecules is a text file that contains a molecule description per line. The molecule description must follow one of these formats where each field is separated by a tabulation.
HGNC_id
HGNC_id	Sign
Database_id	Molecule_id
Database_id	Molecule_id	Sign

Example

PPARA                         #The molecule PPARA in HGNC
PPARG +                       #The molecule PPARG in HGNC increases
Transpath	MO000000327   #The molecule MO000000327 (here ATP) in Transpath 
Transpath	MO000000328 - #The molecule MO000000327 (here ADP) in Transpath decreases

Embeded lists of molecules

For convenience, some lists of molecules have been already prepared. All of them are in the folder in/data.

Prepare your jobs

A job is a text file containing one or more job{...} sections. Each of them describes where to write the results [1], how to select relevant regulated reactions from Transpath [2,3,4,5,6], how to convert them into an influence graph, and how to analyze this graph [7,8,9]. In order to get a working job file, copy paste the following template (or download it). You only have to change the numbered commented lines.

This template search for key regulator of glycolysis enzymes described in in/data/glycolysis.txt (download). After defining where to put the result files [1], the first step is dedicated to totally ignore very common molecules described in in/data/blacklist-remove.txt (download) [2]. Then, a subgraph of Transpath is computed by taking the two first levels of neighborhood of the molecules implied in glycolysis [3,4], without using the top xxx (in this template, xxx = 2000) hubs[5]. This graph is then converted to an influence graph, which is used to find key regulator of enzymes participating to glycolysis [8]. In this context, no more observations are available on variations of molecules [9].

job{
name = out/glycolysis-results #[1] default output files prefix
filter_spaimr{ #Totally ignore some molecules
type= no_effect
blacklist = in/data/blacklist-remove.txt #[2] Remove the molecules from this list
}
filter_spaimr{ #Work the neighborhood of a list of molecule
type = neighbor_nohub
num = 1 #[3] neighborhood level (WARNING see details)
max_hub = 0
roles = spaim
startlist = in/data/glycolysis.txt #[4] take the neighborhood of molecules from this list
blacklist = in/data/hubs/blacklist-2000.txt #[5] compute the neighborhood without using this molecules
output = out/glycolysis-neighbor.txt #[6] log neighborhood results here
}
compute_filter_spaimr{}
compute_influences{ #build an influence graph from the selected regulated reactions
max_balance = 0
compute_balance = no
compute_prod_by_unknown = yes
}
stats_influence{}
cneighbor{ #search for key regulators (you can write this block multiple times, with differents output file prefix)
id=out/glycolysis-key #[7] output file prefix for key regulators results
targets =in/data/glycolysis.txt #[8] a list of molecules describing the regulated molecules
observed=in/data/empty.txt #[9] a list of molecules (that may be empty), containing additionals informations on molecules variations
}
write_full_graph{}
} #end job

Parameters details

Run your jobs

Read your results

Logs

Mapping key regulators IDs

Misspelled IDs or missing mapping information between HGNC and Transpath may produce a bad mapping between the input lists of IDs and information in Transpath database. To check if everything worked properly, the mapping results between your IDs (provided in the lists targets and observed of the cneighbor{...} block) are summarized in a file called xxx.mapping, where xxx is the filepath defined in the id field of the cneighbor{...} block. Unmapped molecules are ignored during the analysis.

Generic statistics

The statistics are in the xxx-stats-yyy.txt files, where xxx is the job name.

xxx-stats-inf-basic.txt

xxx-stats-inf-edgematrix.txt

This files contains a tabulation separated matrix. The number column s, line t is the number of influence which start from a node of type s, and finishes to a node of type t.

xxx-stats-inf-nodetypes.txt

The number of nodes of each type.

xxx-stats-inf-epn.txt

This file summarizes the number of edges per node.

Key regulators scores types

Key regulators scores values

The key regulators scores are collected in a file named xxx.scores where xxx is the filepath defined in the idfield of the cneighbor{...} block. This file is a tabulation separated array composed of the following columns.

Graphs

The output folder contains files entitled xxx-score-score_zzz.graphml.gz, where xxx is the output prefix used in the job description with name=xxx and zzz is the score type. Such files can be opened by any graph program that is compatible with graphml format, as Cytoscape with the graphml plugin, and uncompressed with gunzip yyy.gz or 7zip. Each file contains a graph where molecules are annotated with the corresponding score. Depending on your analysis, the program can produce very large graph that can overflow the capacity of your viewing software or your RAM, so check the generic statistics before loading this graphs.