This page describes a prototypical environment for simulating the biochemistry of a cell at the molecular level. This system offers a web-based environment for defining simulations, and examining and archiving simulation results, as well as an extremely simple (and biochemically unrealistic) simulation "engine" that actually conducts the simulations.
Biochemical activity is determined by a list of reactions operating on a list of reactants to produce a list of products. At each step reactions occur with a user-defined probability attached to each reaction, and the number of molecules of each species is recorded for later display. (Note that this approach focuses on individual molecules, rather than molecule concentrations. As a result, this approach lies somewhere between a purely logical representation of biochemical networks and a classical chemical representation based on sets of differential equations.)
The web interface allows users to supply a set of reactions and an initial set of molecules upon which those reactions operate, and then start the simulation and examine simulation results in a variety of display formats. In fact, this work is focused more on the development of an effective environment for defining simulation data sets, and managing and displaying results than on the simulation process itself.
The following topics are presented in this document:
Reaction number |
Reaction | Probability/ rate |
---|---|---|
0 | Join TFIID with the Transcription Binding Protein. | .7 |
1 | Bind to the TATA box of Gene.X, Segment 1. | .7 |
2 | Build the transcription initiation complex. | .7 |
3 | Build the transcription initiation complex under enhancement. | .99 |
4 | Phosphorylate PolymeraseII to activate it. | .7 |
5 | Transcribe the first mRNA segment for Gene X | .7 |
6 | Transcribe the second mRNA segment for Gene X | .7 |
7 | Transcribe the final mRNA segment for Gene X, add the termination complex, and release the transcript. | .7 |
8 | Break up the transcription complex. | .7 |
9 | Degrade mRNA to force an equilibrium. | .7 |
The complete reaction list includes specific reactants, catalysts, and products.
Note that Gene X is assumed to be composed of 3 separate segments (X-1, X-2, X-3) and produces mRNA.X in a segmented fashion. This is done to allow multiple Polymerase II molecules to operate on the same Gene in the usual transcription "pipeline". There are other ways to simulate such parallel activity, but this one is relatively simple.
Please ignore the inaccuracies in this reaction set. The intent of this document is simply to convey the basic operating principles of the simulator. However, such errors do serve illustrate the ways in which the simulation process may or may not represent reality, and, indeed, one of the main thrusts of this work is to understand what aspects of real biochemistry can be and/or must be accurately represented to build useful simulation tools.
Now these reaction can be applied to any collections of molecules, and for this example, they were applied to the following set of cellular components:
Compound | Number of Molecules |
---|---|
ATP | 100 |
ADP | 10 |
PO4 | 1 |
D | 1 |
TBP | 1 |
TATA:Gene.X-1 | 1 |
Gene.X-2 | 1 |
Gene.X-3 | 1 |
ABF | 1 |
PolyII | 1 |
EH | 1 |
PO4 | 100 |
mRNAPrecursor | 100 |
TermII | 1 |
Gene.XActivator | 0 |
Running a simulation will produce multiple intermediate and final products. The reactions and molecules listed above will produce the following products within 500 simulation steps:
D/TBP/TATA:Gene.X-1 |
D/TBP/TATA:Gene.X-1/ABF/PolyII/EH |
D/TBP/TATA:Gene.X-1/ABF/PolyII/EH/PO4 |
EH |
Gene.X-2 |
Gene.X-3 |
Gene.XActivator |
PO4 |
PolyII/EH/PO4/Gene.X-2 |
PolyII/EH/PO4/Gene.X-3 |
PolyII/EH/PO4/Gene.X-3/TermII |
TATA:Gene.X-1 |
mRNAPrecursor |
mRNAX |
mRNAX-1 |
mRNAX-1.2 |
One of the most useful formats is a table of simple 2-D graphs, each showing the molecule counts as a function of step number for every molecule species included in the simulation. For the example above, one such graph showing the counts of mRNA molecules coding for protein X, ATP, ADP, and the mRNA precursor would look like:
ADP ATP mRNAPrecursor mRNAX |
Step number |
The CSE can create tables of graphs of molecule counts with respect to timestep of every molecule included in the simulation, or allow the user to choose a subset of molecule species to graph.
The CSE can also produce text tables (without graphs) showing the number of molecules of each type available at the end of each step. This format seems most useful for debugging, or studying activity in close detail.
The package also supports an Applet, called viewReactionHistories2, that shows a list of reactions and a list of molecules both of which change over time as the simulation proceeds from step to step. From step to step the font size of the name of each molecule grows or shrinks as a function of the number of molecules of that type in the molecule pool. The font size of each reaction name also changes as a function of the number of reactions of each type occurring during a step. This kind of animation attempts to give the user a sense of the overall activity induced by a reaction set. An example is shown later.
It is also possible to get phase space graphs of molecule counts as an alternative to the graphs of molecule counts with respect to time step. These graphs should help users distinguish between significant behavior differences and stochastic noise generated by multiple runs of the same reaction and molecule sets.
Correlograms of individual molecule histories can be selected to help evaluate the dynamic behavior of individual models. A "correlogram" plots autocorrelation values against observation lags. An autocorrelation is a simple correlation of a sequence of observations against a partial copy of itself. For example, for an autocorrelation with a lag of one, the sequence X(1) through X(n-1) will be correlated against the sequence X(2) through X(n), where "n" is the number of elements in the sequence. Each correlogram shows the autocorrelation value resulting from successive lag values of 0 (which produces an autocorrelation value of 1.0) through n/4.
In addition, individual molecule histories can be "differenced" to remove periodicity and the resulting data sequence can be displayed as a phase plot of differenced molecule counts. This allows closer inspection of system dynamics.
Several other display types are contemplated, as are modifications to help users deal with larger reaction sets. For example, the viewReactionHistories Applet could be modified to produce Kiviat graphs and modified bar graphs in place of the text-based displays.
More work is necessary to develop methods for depicting behavior variation. One approach is to conduct multiple runs with the same starting data, and graph a mean and confidence interval for each molecular species. In some cases this may obscure multimodal stochastic behavior as individual simulations select alternate developmental or behavior pathways. To avoid obscuring trends, it may be preferable to create a single graph for each molecule containing a separate line for the behavior of that molecule during each run. Aggregation with respect to time would probably stand out upon inspection.
Reaction number |
Reaction | Probability/rate |
---|---|---|
10 | Make a tRNA molecule. | .7 |
11 | Make a 5S rRNA. | .7 |
13 | Make the large transcript for rRNA5.8-18-28S. | .7 |
14 | Cut the large transcript into 5.8, 18 and 28S rRNAs | .7 |
15 | Make the large ribosomal unit | .7 |
16 | Make translation initiation factor 2. | .7 |
17 | Begin the translation initiation complex | .7 |
18 | Assemble eIF-4. | .7 |
19 | More translation initiation complex. | .7 |
20 | Add the Large Ribosome subunit to the initiation complex. | .7 |
21 | Begin adding amino acids. | .7 |
22 | Release the protein and break up the translation complex. | .07 |
23 | Fold Protein X into proper configuration. | .7 |
24 | Begin process to degrade Protein X molecule | .2 |
25 | Degrade a Protein X molecule | .3 |
The following compounds were included to begin the simulation:
Compound | Number of Molecules |
Compound | Number of Molecules | |
---|---|---|---|---|
ATP | 50 | ADP | 10 | |
PO4 | 1 | D | 1 | |
TBP | 1 | TATA:Gene.X-1 | 1 | |
Gene.X-2 | 1 | Gene.X-3 | 1 | |
ABF | 1 | PolyII | 1 | |
EH | 1 | PO4 | 100 | |
mRNAPrecursor | 20 | TermII | 1 | |
Gene.XActivator | 0 | Gene.tRNA | 1 | |
PolyIII | 1 | tRNAPrecursor | 50 | |
rRNAPrecursor | 50 | Gene.rRNA5S | 1 | |
Gene.rRNA5.8-18-28S | 1 | PolyI | 1 | |
ribonucleaseMRP | 1 | eIF-2-Protein1 | 1 | |
eIF-2-Protein2 | 1 | eIF-2-Protein3 | 1 | |
eIF-4A | 1 | eIF-4E | 1 | |
eIF-4G | 1 | ProteinPrecursor | 50 | |
tRNAmet | 1 | GTP | 100 | |
PADP | 1 | CRF | 1 | |
Chaperone | 1 | Ubiquitin | 1 | |
Proteasome | 1 |
Here is a graph showing the amount of some of the compounds produced by the simulation with the molecule complement shown above:
Protein.X ProteinPrecursor mRNA.X mRNAPrecursor |
Step number |
Note that the number of molecules of protein X rises fairly consistently through the course of the simulation, although the amount of mRNA.X ebbs and flows as it is created, incorporated into complexes, and degraded over time.
Here is a chart from a run where two proteins are produced: proteins X and Y. The base-line transcription probabilities for the mRNAs of both proteins is .01 and the probabilities for production in the presence of an activator are both .7. In this run only the activator for Y is present:
Protein.X Protein.Y ProteinPrecursor mRNA.X mRNA.Y mRNAPrecursor |
Here the number of molecules of mRNA climbs to a plateau where it oscillates, as detailed in this phase space chart for mRNA.Y (taken from a later run with the same parameters):
Molecule | Phase plot (Molecule count at step N, molecule count at step N+1) |
mRNA.Y |
The correlogram for mRNA Y is not very interesting, so to show an example correlogram, here is the correlogram for the second (virtual) component of Gene Y from a similar run:
Molecule | Correlogram for steps 0 through 501 (X axis: Autocorrelation lags from 0 to the smaller of N/4 or 2000) |
Gene.Y-2 |
|
Here is a screen-capture of an example run creating Proteins X and Y simultaneously from another simulation run:
Note that both list windows are independently scollable, and the replay can proceed in either a continuous or stepwise fashion.
Note also that this "view" of a reaction history may be joined by other views in later versions of the script. For example, molecule counts can be presented as bar charts, pie charts, Kiviat graphs, etc. and lines can be added connecting individual reactions with reactants (in one color) and products (in another color), while retaining the independent window scrolling.
You may choose these sets from a collection of test sets via pull-down lists, or upload your own sets from your desktop.
If you choose to upload your own reaction set you must create a text file (NOT a formatted file such as MS Word .doc files) containing several lines for each reaction you wish to define. Each line must begin with one of the following parameter types:
title: Combine A with B to make C. reactants: A,B catalysts: E products: C duration: 3 # i'm not sure about this duration. probabilities: .7 comment: This reaction takes three time steps to complete.
All information after any pound sign (#) on an input line will be considered as commentary and ignored by the program. If a line ends with a comma (,), the next line will be treated as an extension of the first line. This allows users to list each reactant, catalyst, product, etc. on a separate line, if desired. The example reactant line above COULD be written as:
reactants: A, B
If you choose to upload your own molecule set, you must create a text file containing a list of molecule species along with the number of molecules of each species to start with and, optionally, the time step after which those molecules will be injected into the model. For the sample reaction set above, you might use:
A=10 B=10 E=1 # start with just one molecule of E. E=2,100 # add two more molecule after 100 time steps.
When you access the Cell simulation web page you will be given an opportunity to type in the names of your local files or "Browse" through your desktop file system to choose them by double-clicking on their icons. When you specify your other simulation parameters, and press the "Do it" button, your files will be uploaded to the host system, where they will be put in temporary files to be used by the Cell simulation engine. (The temporary files will be deleted after they are used.)
The initial molecule complement is copied into a list of current molecules available for reacting (or into the list of molecules already reacting, if a molecule is scheduled for deferred injection into the system).
There are three execution phases at each time step:
Users can also direct the simulation engine to use duration parameters as means of exponential distributions rather than as fixed values. When this is done, reaction durations will vary from reaction to reaction with an exponential distribution.
Since reactants are taken out of the list of molecules available for reacting while they are actually engaged in a reaction, there may not be enough reactant molecules available at during a time step to perform all the reactions predicted during the previous phase.
With the current version of this Cell simulation system, backward and forward versions of each reaction must be entered explicitly. That is, two reactions must be defined, one proceeding forward and the other backwards. For example, the backwards version of the reaction defined earlier will look like:
title: Make A and B from C under catalysis by E. reactants: C catalysts: E products: A, B duration: 1 probabilities: .2 comment: This is the backwards version of the other reaction.
Since the specified probabilities are "probabilities of transition," they do not reach a steady state unless there are two or more acting in opposition. For the previous example, if the numbers of A and B molecules are identical, roughly 80% of them will be converted to C at steady state. (It would be possible to increase reaction rates as a function of the numbers of reactants available, but this has not yet been implemented.)
With the current software it appears that pairs of reverse reactions both specified with large probabilities, say .9 forward and .9 backwards, yield weird results because they keep large proportions of a molecule population reacting constantly. Pairs close to (.1, .1) seem marginal, and pairs with larger probabilities, but a ratio of 10 to 1, seem safe, and large reaction durations exacerbate this problematic behavior.
Caveat: This current simulation method will certainly not provide "realistic" results in the sense of accurately representing real biochemistry. It MAY, however, express users' intuitive notions of reaction processes, and capture enough of the dynamics of networks of reactions to help users understand the emergent, or overall, behavior of such networks.
Several researchers have observed that network structure appears to be more important than exact biochemical fidelity in this kind of modelling, but the degree of fidelity required for useful results remains to be seen. At any rate, enhancements to the current approach are already being considered. In particular, plans have been made to apply an approach inspired by S-systems (Voit, 2003) to computing reaction probabilities dynamically. This will inject non-linearity into the system, which should allow it to model events triggered by bursts of activity within regulatory networks.
The main program is a 3000-line Perl program (cse.pl) that runs in both Unix and Windows CGI environments. This program collects form data from the user via a Web browser, conducts the simulation, archives results on disk for later display, and/or displays results immediately.
A separate 1100-line Perl program (display.pl) displays archived results as requested, and a 300-line Perl program (make-graph.pl) helps manage graph selection and display for both cse.pl and display.pl.
Detailed graphs are produced by two 400-line Perl programs (ping_graph.pl and phase-grapher.pl) running on either Unix or Windows and compressed (thumbnail) graphs are produced by a 500-line Java servlet (Grapher.java) running on a Windows-based Tomcat servlet processor. The graphing routine for auto correlations (auto-corr-grapher.pl) is a descendent of ping_graph.pl, and the routine for difference plots is a descendent of difference-grapher.pl.
There are several other utility routines that will not be discussed here (lister.pl, setup-setup-difference-grapher.pl, setup-difference-grapher.pl, make-difference-x-values.pl, etc.).
Simulation results can also be displayed by an Applet (viewReactionHistories2.java) composed of classes totaling about 1600 lines of Java.
These programs use both XML and Perl dump files to exchange data. XML has shown itself to be quite useful and appropriate for these applications, but data can be more easily moved between two Perl programs by using the Perl dump format.
All programs except grapher.pl were written by the author. grapher.pl was written by Boone Bradley on commission.
The approach described in this page is somewhat unique among simulations operating at the molecule-level in it's focus upon reactions transforming molecules, rather than upon individual molecules undergoing reactions.
For a (non-exhaustive) list of some other simulation projects, including two pointers to reviews of the field, see http://condor.cc.ku.edu/~grobe/cell-sim/projects.html.
For example, consider the 3 protein feed forward (FF) chain in which:
The Cell Simulation system can be used to illustrate this behavior (though its reaction kinetics are mostly bogus). To do that, a reaction set generating the three proteins was constucted using individual sequences similar to the examples above, modified to include 30 time-step delays while moving newly translated mRNA to the cytosol and newly translated activator protein back to the nucleus.
The following graph shows that only X and Y are produced when the activator is present for only 25 time steps.
Protein X Protein Y X activator |
Step number |
and the next table shows what happens when the activator is present for 200 time steps.
Protein X Protein Y Protein Z X activator |
Step number |
Protein Z was produced during only 1 of 9 simulations with the brief stimulus, but was produced during all 10 simulations with the 200 time step stimulus.
These results seem quite intuitive, so that simulation merely reaffirms intuition. Consider, however, the same chain modified so that protein Z is activated by protein Y alone, but in binding-site competition with protein Z, a so-called "incoherent" feed forward loop. In this case, intuition seems less helpful, but Mangan, et al. have shown that the production of protein Z will pulse in ways that are probably not illustratable by the CSE, because they require more realistic (and non-linear) reaction kinetics which the CSE simulation engine does not presently represent.
As another example, consider the loop formed by activating the transcription of Protein X with Protein Z. Here is a graph of a single run.
To look at the dynamics of this simulation, the frequency graph of the count of Protein Z molecules was plotted on a log-log scale. In addition, the same data was subjected to Detrended Fluctuation Analysis.
DFA was first
proposed in Peng C-K, Buldyrev SV, Havlin S, Simons M, Stanley HE, Goldberger
AL, Mosaic
organization of DNA nucleotides. Phys Rev E 1994;49:1685-1689, and was
obtained from Physionet.
According to Peng, et al. in "
Quantification of scaling exponents and crossover phenomena in nonstationary
heartbeat time series (CHAOS, Vol. 5, No. 1, 1995)", a straight line
"indicates the presence of scaling," and the slope of the line indicates the
nature of that scaling, and slopes of, or close to, 1.0 indicate a 1/f noise
relationship, a series possessing dependencies at many, or all, time scales
examined,
Scaling this approach to larger and more complex reaction sets will present a
significant challenge, but most likely can make a significant contribution to
understanding the systemic functioning of the cell.
However, this work is just beginning and much effort will be required
to assess the accuracy and usefulness of this system and systems like it. One of
the first steps will be to provide more realistic kinetics. This will probably
require most reactions to be specified with forward and backwards reaction
constants as well as an analog to the nonlinear behavior induced by reactant and
product concentrations. The current plan is to treat reactions as controlled by
reaction constants that essentially represent the probability of a reaction
given a collision among reactant molecules, and consider the probability of
collision as a function of the product of the number of molecules of each
reactant involved in a reaction. (One alternative would be to use a single
parameter for each reaction that is taken as the ratio of products to reactants
at equilibrium, possibly along with a "velocity factor" specifying how many time
steps to use to reach that equilibrium ratio.)
Note that this system can be used to simulate processes from other domains.
For example, it has been used to simulate computer
behavior
Michael GrobeSummary
This page has shown some of the display
capability and intended uses of a web-enabled simulation environment. The focus
has been on developing a usable, intuitive interface, rather than on faithful
recreation of cell biochemistry.
December 2002
October 13, 2003
November 3,
2003
December 2003
January 2006