Rails REST web service and dashboard UI for launching MPI clusters on Amazon EC2 and running user submitted jobs
h1. ec2cluster
ec2cluster is a Rails web console, including a REST API, that launches temporary Beowulf clusters on Amazon EC2 for parallel processing. You upload input data and code to Amazon S3, then submit a job request including how many nodes you want in your cluster. ec2cluster will spin up & configure a private beowulf cluster, process the data in parallel across the nodes, upload the output results to an Amazon S3 bucket, and terminate the cluster when the job completes (termination is optional). ec2cluster is like Amazon Elastic MapReduce, except it is uses MPI and REST instead of Hadoop and SOAP. The source code is also free for use in both personal and commercial projects, released under the BSD license.
h3. Features
h2. Running MPI jobs on EC2 with ec2cluster
overview goes here, workflow, s3 inputs, commands.
h3. Using the Web dashboard
instructions go here for kmeans example
h4. standard C kmeans clustering example
h3. Using the REST API
h4. API methods
TODO: API docs and instructions go here
h4. Sample Ruby REST API client example
TODO: flesh this out more…
The full code is at “http://github.com/datawrangling/ec2cluster-client-demos”:http://github.com/datawrangling/ec2cluster-client-demos/tree/master
Fill in your AWS info and server details in config.yml:
aws_access_key_id: AAAAA8BHBBBBBBBBBMM9
aws_secret_access_key: ABCiDu/BIBSCubisbdiBISBDUSBDUSBFSUIbubbu
rest_url: https://ec2-XXX-XXX-XX-XXX.compute-1.amazonaws.com/
admin_user: REPLACE_ME
admin_password: REPLACE_ME
keypair: your-keypair
Use ActiveResource to communicate with the ec2cluster REST API with Ruby
class Job < ActiveResource::Base
self.site = CONFIG['rest_url']
self.user = CONFIG['admin_user']
self.password = CONFIG['admin_password']
self.timeout = 5
Submit a new job request to the API using just the required parameters:
job = Job.new(:name => "Kmeans demo API test 2",
:description => "Simple Kmeans C MPI example, run by pete using Ruby client",
:input_files => s3infiles.join(" "),
:commands => "bash run_kmeans.sh",
:output_files => expected_outputs.join(" "),
:output_path => CONFIG["outputbucket"] + "/" + out_path,
:number_of_instances => "3",
:instance_type => "m1.small")
job_id = job.id
Periodically ping server for job status until Job is complete, then fetch output files from S3
until job.state == 'complete' do
job = Job.find(job_id)
puts "[State]: " + job.state + " [Progress]: " + job.progress unless job.progress.nil?
rescue ActiveResource::TimeoutError
puts "TimeoutError calling REST server..."
sleep 5
Some examples of other optional parameters for Job.new()
master_ami => "ami-bf5eb9d6"
worker_ami => "ami-bf5eb9d6"
user_packages => "python-setuptools python-docutils"
availability_zone => "us-east-1a"
keypair => CONFIG["keypair"]
mpi_version => "openmpi"
shutdown_after_complete => false
h4. Curl examples
Submit a new job request
To submit a new job via a json request, you can post a JSON request to the REST api:
$ curl -H "Content-Type:application/json" -H "Accept:application/json" -d \ "{\"job\": {\"name\": \"My Json MPI job\", \"description\": \"test run of X\!Tandem\", \ \"user_id\": 2, \"number_of_instances\": 12, \"instance_type\": \"c1.medium\", \ \"input_files\": \"s3://mybucket/input/genome.txt s3://myfastabucket/somedata.fasta\", \ \"commands\": \"bash runtandem.sh\", \"output_files\": \"myoutput.txt\", \"output_path\": \ \"S3://myoutputbucket/myrunsfolder\"}}" http://localhost:3000/jobs
Checking on job status via GET
$ curl http://localhost:3000/jobs/3.json
Cancel a job request and shutdown the EC2 cluster
$ curl -H "Content-Type:application/json" -H "Accept: application/json" \ -X PUT http://localhost:3000/jobs/3/cancel
h3. More code examples & applications
h4. Rmpi example:
h4. psvm support vector machine example
h4. Mpi4py example: monte carlo options pricing
h4. Excel example: credit risk
h4. Bioinformatics example: mpiblast
h2. Why use MPI? Why not Hadoop?
If you can solve your problem with Hadoop, go for it. If you are short on time and MPI code exists that solves your problem, then you might want to try ec2cluster. MPI has been around for a while and there are lots of existing libraries for a number of domains. That said, debugging MPI jobs and dealing with node failure can be a hassle. Reuse or reimplement, your choice.
h2. Dependencies
dependencies will go here (rails, ec2onrails, capistrano, Amazon EC2/S3 setup)
h2. Installation
fetching the source code instructions will go here
h2. Deploying to EC2
Install the ec2onrails gem as described at http://ec2onrails.rubyforge.org/:
$ sudo gem install ec2onrails
Find AMI id of the latest 32 bit ec2onrails image (in our case this was ami-5394733a):
$ cap ec2onrails:ami_ids
Launch an instance of the latest ec2onrails ami and note the returned instance address from ec2-describe-instances, it will be something like ec2-12-xx-xx-xx.z-1.compute-1.amazonaws.com
$ ec2-run-instances ami-5394733a -k gsg-keypair $ ec2-describe-instances
Create the needed configuration files from the provided examples and edit them, filling in your instance address information, keypairs, and other configuration information as indicated in the comments of each file. See the ec2onrails documentation or source code for more details on each setting. If you want to make changes to the ec2cluster code, be sure to replace the base github repository in deploy.rb and config.yml with your own github location.
$ cp config/deploy.rb.example config/deploy.rb $ cp config/s3.yml.example config/s3.yml $ cp config/config.yml.example config/config.yml $ cp config/database.yml.example config/database.yml
Be sure to substitute in your own AWS key and secret key in both config.yml and s3.yml
aws_secret_access_key: YYVUYVIUBIBI aws_access_key_id: BBKBBOUjbkj/BBOUBOBJKBjbjbboubuBUB
Also replace the admin user name and password in config.yml:
admin_user: REPLACE_ME admin_password: REPLACE_ME
This application should be run under SSL and the access should be restricted to only trusted application ip addresses using EC2 security group filters. Ideally, your client application is also running on EC2 in the same security group.
Deploy the app to your launched EC2 instance with Capistrano (this wil take several minutes)
$ cap ec2onrails:setup $ cap deploy:cold
Use the admin login information you set in config.yml to access the dashboard from a web browser or as web service at the url of the instance you provided in deploy.rb: https://ec2-12-xx-xx-xx.z-1.compute-1.amazonaws.com . You can also ssh into your running EC2 instance as usual with your keypairs to debug any issues. See the ec2onrails forums for more help debugging deployment issues.
To redeploy the app after making changes to the base ec2cluster code (this will also restart the delayed_job services which launch and terminate EC2 clusters):
$ cap deploy
To manually restart the apache service or mongrels:
$ cap ec2onrails:server:restart_services $ cap deploy:restart
If the job processor is stuck waiting for instances, you might need to ensure the delayed_job worker is running. You can manually stop and start the delayed_job workers as follows:
$ cap delayed_job:stop $ cap delayed_job:start
To clear the job queue, ssh into the EC2 instance and run the following command from within
/mnt/app/current(no rake task yet):
$ RAILS_ENV=production rake jobs:clear
h2. Running a development version locally
Unless your local machine’s ip address and rails port is publicly reachable, local runs will not function properly (the cluster nodes need to communicate with the rails app).
Do the normal rails gem install dance for any missing dependencies
$ rake gems:install
Create the database
$ rake db:create $ rake db:migrate
Launch the rails app itself
$ script/server => Booting Mongrel => Rails 2.3.2 application starting on => Call with -d to detach => Ctrl-C to shutdown server
Launch a background delayed_job worker in a separate terminal window
$ rake jobs:work (in /Users/pskomoroch/rails_projects/ec2cluster) *** Starting job worker host:72-63-103-214.pools.spcsdns.net pid:12221 background cluster launch initiated... 1 jobs processed at 0.0498 j/s, 0 failed ... background cluster shutdown initiated... 1 jobs processed at 0.0499 j/s, 0 failed ... background cluster launch initiated...
Navigate to http://localhost:3000/ to use the web UI or submit REST requests