cloud crowd

Parallel Processing for the Rest of Us

851
92
Ruby

=
_ _
( )_ ( ) )
(_ (_ . ) )
_
( )
_ . ( ` ) . )
( _ )
(
, ( ,))
(
( ,)

       _  _               ___ _             _  ___                   _     
      ( `   )_           / __| |___ _  _ __| |/ __|_ _ _____ __ ____| |    
     (    )    `)       | (__| / _ \ || / _` | (__| '_/ _ \ V  V / _` |    
   (_   (_ .  _) _)      \___|_\___/\_,_\__,_|\___|_| \___/\_/\_/\__,_|    
                                                                           
                                                 _                         
                                                (  )                       
              _, _ .                         ( `  ) . )                    
             ( (  _ )_                      (_, _(  ,_)_)                  
           (_(_  _(_ ,)                                                    

~ CloudCrowd ~

* Parallel processing for the rest of us
* Write your scripts in Ruby
* Works with Amazon EC2 and S3
* split -> process -> merge
* As easy as `gem install cloud-crowd`

Well-suited for:

* Generating or resizing images.
* Encoding video.
* Running text extraction or OCR on PDFs.
* Migrating a large file set or database.
* Web scraping.

~ Documentation ~

Wiki: https://github.com/documentcloud/cloud-crowd/wiki
Rdoc: http://www.rubydoc.info/github/documentcloud/cloud-crowd

~ Getting started ~

# Install the gem.

  >> sudo gem install cloud-crowd

# Install the CloudCrowd configuration files to a location of your choosing.

  >> crowd install ~/config/cloud-crowd

# Now, you can use the full complement of `crowd` commands from inside of
# this configuration directory. To see the available commands:

  >> crowd --help

# Edit the configuration files to your satisfaction, add AWS credentials, 
# and then load the CloudCrowd schema into your configured database.

  >> cd ~/config/cloud-crowd
  >> mate config.yml
  >> mate database.yml
  >> [create the database you just configured...]
  >> crowd load_schema

# Write your actions, and install them into the 'actions' subdirectory.
# CloudCrowd comes with a few default actions as an example.

# To launch the central server (make sure that you include its location
# in config.yml):

  >> crowd server

# The configuration folder also includes 'config.ru', which can be used by
 # any Rack-compliant webserver to run your central server.

# Then, to launch a node of workers:

  >> crowd node

# To spin up remote nodes, install the 'cloud-crowd' gem and copy over
# your configuration directory. Run `crowd node`, and the remote machines
# will register with the central server, becoming available for processing.

# At this point you can visit your Operations Center at localhost:9173 to 
# view all of your nodes, ready for action.