BioPerf: A Benchmark Suite to Evaluate High-Performance Computer Architecture on Bioinformatics Applications
The BioPerf suite contains codes from 10 highly popular bioinformatics packages
and covers the major fields of study in computational biology such as sequence
comparison, phylogenetic reconstruction, protein structure prediction, and
sequence homology & gene finding. We demonstrate the use of BioPerf by
providing simulation points of pre-compiled Alpha binaries and with a
performance study on IBM Power using IBM Mambo simulations cross-compared with
Apple G5 executions.
The BioPerf suite (available from www.bioperf.org) includes benchmark source
code, input datasets of various sizes, and information for compiling and using
the benchmarks. Our benchmark suite includes parallel codes where available.
Installation of BioPerf
install-BioPerf.sh is the script which installs BioPerf on the user machine.
of BioPerf is $HOME/BioPerf, else the user input is taken.
is the complete package of BioPerf consisting of source codes, pre-compiled executables, installation, compiling and running
scripts, and the input datasets of varying sizes for each of the codes.
the environment variable $BIOPERF. Every script used for running executables of BioPerf sources this file $BIOPERF /.bioperf, g
oes to the appropriate subdirectory of $BIOPERF, and then runs the executables. If this file is moved or the installation directory
is moved or renamed, the script will fail to run asking the user to edit the .bioperf file.
BioPerf directory structure
separate sub-directories, Alpha-binaries, x86-binaries (Linux) and PowerPC-binaries (Mac OS). x86 and the PowerPC binaries
are included for all the executables, while Alpha binaries are not fully included for all the codes. The
subdirectories for each of the platform further have directories for each of the packages.
the inputs for each executable are further categorized into class-A, class-B and class-C based on the sizes of the inputs. Some
of the input directories have only one class of input, in which case there are no further subdirectories. The larger databases
Swissprot (71MB), NR (1.46 GB) and Pfam (633 MB) are not included in the Inputs directory and have to be separately downloaded.
Incase an attempt is made to run a script for a executable which uses any of these databases, the scripts will look for the
databases on the host machine in a directory represented by the environment variable $DATABASES. The databases can be downloaded
from www.bioperf.org website; if the databases are not found, the script will fail. If the databases are downloaded, then the
user needs to set an environment variable $DATABASES to the directory where these databases have been downloaded. The run script
will then be able to run.
sub-directories by the name of each of the packages.
wrapper for all these scripts. It has further subdirectories that includes the scripts for running each of the codes for small,
medium and large datasets, scripts for compiling each of the codes , running the BioPerf suite and installing BioPerf on your
architecture incase your architecture’s binaries are not part of the BioPerf package. The script has separate subdirectories for
each of the tasks, but the naming itself is fairly intuitive.
execution phases.
sub-directories for every package, and further with the executable for the packages having more than one executable.
The following scripts are included in BioPerf in the root installation directory:
scripts are explained below. This script is basically the script for doing all the supported tasks of BioPerf except installation.
Mac OS or Alpha, you can use this script to compile the codes. This script picks up the makefiles of the source codes from the
Source-codes directory, tries to do a make for each of the codes and installs the compiled codes into a subdirectory called
$HOSTNAME/Binaries It will also create $HOSTNAME-Scripts subdirectory inside the directory Scripts, which can then be used to
run the newly compiled executables. Incase the codes cannot be compiled on your architecture, the script will output an error
message telling the user which code failed to compile.
How to use BioPerf
supported tasks in BioPerf.
The following choices are available:
$BIOPERF/Scripts/Run-scripts/run-bioperf.sh. The script prompts the user for choosing either the platform – if the platform is x86,
ppc or alpha, it then gives two modes of running BioPerf: either the user can run all the codes one-by-one, or the user can choose
codes which would be run. The user can then add packages to be run, with a prompt for every package. Incase a package is selected to
be run, all the executables inside the package are run. After the packages have been selected for running (either all of them if the
user chooses the first option or adds the packages one-by-one), the user is prompted for the size of the input datasets which it would
like to run for. The size of the input that is selected, all the codes are run for the same input dataset for e.g. if the user selects
class-C, all the packages selected for running are run for class-C input dataset. As explained above, some of the larger databases are
not included in the BioPerf package to reduce the size of the package as much as possible. If the user selects packages with input sizes
(class-C) which require these databases, and the databases are not installed or the $DATABASES environment variable is not set to the
appropriate directory, the user is notified, and is then given the choice to run BioPerf without the particular package, or abort the
whole running. When BioPerf is subsequently run, it outputs time of running for each executable and also the complete execution time
for running the suite with the selected packages.
through the script $BIOPERF/Scripts/Run-scripts/install-bioperf.sh. All the codes are attempted to be compiled, and the script fails
incase any of the codes fails to compile. The script also tries to first delete the previous installation if detected before trying to
proceed with the new installation. This is done because the scripts’ first step is to make 2 directories – $HOSTNAME-Binaries and
$HOSTNAME-scripts inside the Binaries and the Scripts directory of the BioPerf package respectively. In case of successful compilations
for each of the packages, the same sub-directory structure is maintained as in the x86 and the PowerPc sub-directories also. This allows
the main run-bioperf.sh to use these executables just as they use x86 and the PowerPC executables.
BIOPERF/Scripts/Run-Scripts/CleanOutputs.sh. All the scripts in BioPerf store the outputs generated in running the executables in the
Outputs directory, which has sub-directories for each of the packages also. This increases the size of the Outputs directory, and
hence the script deletes all the outputs previously generated.
the script $BIOPERF/Scripts/Run-scripts/display-versions.sh