Using the power of Chameleon Cloud, this application probes an OpenStack Swift Container for DNA sequenced data and then processes it into a human-friendly format. The DNA processing is done by using primarily Linux-based Command Line Tools in a specific order - commonly called "pipelines". For the sake of privacy, the current processing pipelines are not being released on Github quite yet. For clarities sake, a generic pipeline is included in the JSON folder.
Using the power of Chameleon Cloud, this application probes an OpenStack Swift Container for DNA sequenced data and then processes it into a human-friendly format. The DNA processing is done by using primarily Linux-based Command Line Tools in a specific order - commonly called “pipelines”. For the sake of privacy, the current processing pipelines are not being released on Github quite yet. For clarities sake, a generic pipeline is included in the JSON folder.
Current Linux image being use: Ubuntu 14.04
Graduate Student Researcher at the Open Cloud Institute at the University of Texas San Antonio. This project is done in collaboration with the University of Texas Health Science Center at San Antonio.
Contains python files that allow the application to run. See the README.md file inside the folder for more information.
Contains JSON files that help set up the reference genome and HG19 Database for data processing. See the README.md file inside the folder for more information.
Contains bash scripts that are relevant to the application. See the README.md file inside the folder for more information.
Contains Common Workflow Language scripts to be used alongside the JSON scripts. See the README.md file inside the folder for more information.
run requirements.txt by using:
$ sudo pip install -r requirements.txt
Install all of the necessary processing tools using written in INSTALLATION.md
Start the application by navigating to the python folder and then using:
$ python orchestration.py sample.json
https://github.com/AKBoles/GenomicDoBox
Burrows-Wheeler Aligner: http://bio-bwa.sourceforge.net/
SAMtools: http://samtools.sourceforge.net/
Chameleon Cloud: https://www.chameleoncloud.org/
OpenStack Python SDK: http://docs.openstack.org/user-guide/sdk.html
1000 Genomes Project: http://www.internationalgenome.org/
FASTQ File Type Description: https://en.wikipedia.org/wiki/FASTQ_format