Protein folding simulator
Proteins fold into biologically ‘active’ states which allow them to perform important tasks. Whether or not they can perform this task is dependent on their folded structure, and if they are not in the correct ‘active’ state, then they are considered ‘dead’. A build up of ‘dead’ proteins in the body is not just inefficient, it can also be harmful to the bodies function. Therefore, understanding the folding of these structures is crucial, to academic biolgists, as well as companies designing productes whic may interact with proteins.
This model simulates the folding of a randomly generated protein, given certain input parameters (length, temperature of solution, interaction energies etc.) on an infinite lattice.
For more information on the background, references and an application of this programme see the paper A Lattice Simulation Approach to Protein Folding.pdf
.
Clone the repository:
git clone https://github.com/JosephPB/Protein
The files are divided into directories depending on your requirements. The Real
directory contains scripts which allow the user to input the exact amino acid make up and order of the amino acids by hand. The Simulation
directory contains scripts which do not require amino acid type user input, and instead will randomly generate a protein. Inside these two directories are 2D
and 3D
directories that contains a verion of the programme which runs on an infinite 2D lattice, and equivalently for the 3D
directory. If you have access to a batch system the batch
directory is set up to run such a programme with modifications. On a batch system change the batch.sh
shell file and the destination of the write file /mt/batch/...
to your appropriate values.
Once you have selected theappropriate directory for the simulation:
make
./protein
(predefined inputs can be used, see 'Inputs subsection)plotting
directoryThe parameters are as follows:
Number of amino acids
: this is the monomer chain length desiredInitialise unfolded protein
: y
generates a totally unfolded (straight), n
gives a protein produced by a non-crossing random walk beginning at the originTemperature
: the temperature of the solution in arbitrary units (Boltzman factor, k = 1)Energy matrix
: i
fills the 20x20 symmetric interaction matrix with a integer values from a uniform distribution between limits to be defined, r
fills it with double
values from a uniform distribution, j
places at each element a 1
or a -1
at random, and m
fills the matrix with the values found in Miyazawa and Jernigan (1985), Table VIi
or r
at step 4) Range
: specify the range of the distributionNumber of time steps
: the number of times the programme will go through it’s folding computation (measured in Monte Carlo time)Seed
: the seed value for all random number generators used in initialising the protein, 0
given time
as a seedPreset input files, to avoid manual input of the above parameters, can be stored in theinputs
directory. Examples are given which may be editied accordingly, or new ones added. (See jobs.sh
or batch.sh
in the 3D
and batch
directories respectively for examples of using shell files with inputs.)
The plotting
directory contains plotting code. plotsingle.py
takes input from one .csv
file at the command line:
plotsingle.py <filename.csv>
and plots energy vs. Monte Carlo time and lenth vs. Monte Carlo time.
Similarly plotavg.py
takes multiple file inputs, of the same monomer lenth and different temperature at the command line (see jobs.sh
in 3D
for an example) and plots average energy vs. temperature and average length vs. temperature.
Additionally, plotmult.py
takes in multiple runs at three different temperatures and creates a three tiered subplot for comparison.
We use SemVer for versioning.
We hope to add a visual element to the process, allowing for snapshots of the protein folding process to be taken and visually realised, potentially by linking up with other established protein visualisation software. Additionally, research into domain folding may lead to increased computation speed (e.g. adapting from Abkevich, Gutin and Shakhnovich (1995) to a more generalised model).
In a later version, we hope to be able to reproduce and improve upon the work of others in this field such as Sali, Shakhnovich and Karplus (1994).