GenePrediction

Algorithms for computational biology.

2
0
C++

This is basic code for doing biological sequence analysis.

  • SmithWaterman implements local alignment using the
    Smith-Waterman
    dynamic programming algorithm.
  • Viterbi is an implementation of the
    Viterbi algorithm
    for solving Hidden Markov Models.
  • ProteinCoding scans genomes for ORFs (open reference frames) using
    a probability model.

The code is largely based off of ideas in the book
Biological Sequence Analysis, and uses genome data
that can be obtained from the
NCBI Genome bank and elsewhere.

The code was written in 2006, but I have modernized it to use some nicer
C++ constructs, so it requires a C11 compiler due to the requirement
of rvalue references and lambdas. The code was tested on Visual C
2015,
but should work with GCC or any compiler regardless of platform without
too much trouble.

A word of caution that the code is extremely numeric heavy, and for longer
gene sequences, can take several seconds to several minutes (in 2013) to
compute the results, even with the dynamic programming optimizations.

Some of these algorithms can be useful outside the computational biology
space. For example, Smith-Waterman can be useful in doing text correction,
and Viterbi can be used to do signal analysis for speech-to-text.

Toby Jones (www.turbohex.com, ace.roqs.net)