Dockerized Distributed Computing prototype for Docker Global hackathon and DoraHacks
GOAL
To setup a basic prototype for distributed computing in docker. If time permits, add a complex
computing task.
INTRODUCTION
For the purpose of Distributed (Scientific) Computing, scientists across the world have been
mostly using pre-configured VM images to let the client volunteer in contributing
towards micro-processing tasks that involve processing of raw data received in
chunks over the network.
But, since the introduction of Docker, life has changed and so have the performance
benchmarks. We propose an system that uses the benefits of Docker to hopefully perform
far better than the currently achieved milestones through VMs. The VMs have a huge
overhead of starting up, as compared to Docker containers. Moreover, we don’t even need
to explain the difference between running more than one VM on a HostOS compared to
running multiple docker containers on that same machine! See the point? 😃
Youtube Video Explaining this project:
Click here for Screencast for running just one script on client side.
INSTALLATION
Server side (src/server/):
Make sure that your ‘src/server/’ is up and running, either locally (for test purpose),
or if its deployed elsewhere, then the hostname/IP and Port is provided in the environment
variables as under $DC_HOST
and $DC_PORT
.
(refer next major point on ‘Client side’ for this script)
Do ensure that for running the server, you need to install mongoDB. Refer to following:
install_mongo guide
and then set the env variables U_DB, U_USER and U_PASS
giving the same values to them as the
db name, it’s user and password set while setting up mongoDB.
Ensure that you’ve installed deps from dockerComp/src/server/requirements.txt
To run the src/server, open up a terminal, go to dockerComp/src/server/ and run $ ./start
This starts the server locally on your machine.
Client side (src/client/):
Note: For server side deployement (i.e., the server that basically is responsible for distributing data
to clients), It has to be deployed somewhere and it’s IP has to be provided in your configuration
file.
And then you may distribute the script installer.sh
alongwith the configuration
to the clients.
Download This Script and run
$ ./installer.sh
[configure your Server location for this script, as under $DC_HOST
& $DC_PORT
]
$HOME/dockerComp/src/client/scripts/nohup.out
and$HOME/dockerComp/src/client/scripts/slave_manager
. To kill the$ kill -9 $(ps -e | grep slave_manager | awk -F' ' '{print $1}')
Should you need to remove all traces of dockerComp from your machine, just run the script 'cleanup`
included in the source code of this project root.
Cheers! 😃
NOTES
Demo link to be updated soon.
In case you’re curious how to go about running this from client side:
Docker Image: $ docker pull arcolife/docker_comp
(will be kept updated)
FAQ
Refer to Wiki … click Here!.
References:
- http://www.rightscale.com/blog/sites/default/files/docker-containers-vms.png
- http://en.wikipedia.org/wiki/Docker_%28software%29#cite_ref-3
So, just to give you a context of this whole project, take a look at this project called
CernVM. This is a really awesome project, developed to
help collect CERN’s LHC data and perform data analysis on a volunteer’s computer or even on
commercial clouds. Just imagine if the whole process of using VM was dockerized!
FEATURES
FUTURE GOALS
Make this a pluggable dockerized distributed computing tool, where you just have to include
a compution task (say, map-reduce) and make it send data to clients. The app should be able
to handle the rest.
Benchmark results and compare with existing methodologies.
TESTS
From client side:
$ src/client/scripts/test_server_conn
(make sure env vars DC_HOST
and DC_PORT
are set)From server side:
Workloads:
WORKFLOW
Server
Dashboard to Manage:
Master app that manages data sent to each client and checks for integrity.
Client
REFERENCES