cloud scheduler

Automatically boot VMs for your HTC jobs

46
14
Python

THIS REPO IS NO LONGER MAINTAINED! THE NEW CLOUDSCHEDULER REPO IS AT https://github.com/hep-gc/cloudscheduler !

Cloud Scheduler 1.13.2 README

Introduction

Cloud Scheduler: Automatically boot VMs for your HTC jobs

Cloud Scheduler manages virtual machines on clouds configured with OpenStack,
Google Compute Engine, or Amazon EC2 to create an environment for HTC batch job execution.
Users submit their jobs to a Condor job queue, and Cloud Scheduler boots VMs to
suit those jobs.

For more documentation on Cloud Scheduler, please refer to:

Prerequisites

Optional Prerequisites

  • Guppy – Used for memory usage info.

Basic Steps to get Jobs Running via Cloud Scheduler

  1. Install Prerequiste libraries
  2. Install Cloud Scheduler & Condor
  3. Configure Condor and Cloud Scheduler
  4. Setup a VM Image with Condor installed & CS Condor Scripts
  5. Add the Required CS Attributes to a job submission file
  6. Start CS and Submit job(s)

Quick Start for People Who Think They Know What They’re Doing

# pip install cloud-scheduler
  • This will install the latest master release, latest dev release available through github

Special help for RHEL 5

Since Cloud Scheduler requires Python 2.6+, and we recognize that RHEL 5 comes
with and requires Python 2.4, here’s a quick guide to getting Python
installed on those systems:

Python 2.6 may be in the repos depending on your version(5.5+):

$ yum install python26 python26-distribute

For Python 2.7:

Install the tools we need to build Python and its modules:

# yum install gcc gdbm-devel readline-devel ncurses-devel zlib-devel \
  bzip2-devel sqlite-devel db4-devel openssl-devel tk-devel \
  bluez-libs-devel libxslt libxslt-devel libxml2-devel libxml2

Download and compile Python 2.7.1:

$ VERSION=2.7.1
$ mkdir /tmp/src 
$ cd /tmp/src/
$ wget http://python.org/ftp/python/$VERSION/Python-$VERSION.tar.bz2
$ tar xjf Python-$VERSION.tar.bz2
$ rm Python-$VERSION.tar.bz2
$ cd Python-$VERSION 
$ ./configure
$ make
$ sudo make altinstall

Now we need to install Python setuputils:

$ cd /tmp/src
$ wget http://pypi.python.org/packages/2.7/s/setuptools/setuptools-0.6c11-py2.7.egg
$ sudo sh setuptools-0.6c11-py2.7.egg

Now install pip to install the rest of our dependencies:

$ sudo easy_install-2.7 pip

And the rest of our dependencies:

$ sudo pip-2.7 install cloud-scheduler

Now clean everything up:

$ sudo rm -Rf /tmp/src/

Finally, once you’ve set up the rest of Cloud Scheduler, you’ll want to set
your Python version in the Cloud Scheduler init script, or use virtualenv.
Do this by changing the PYTHON variable to /opt/bin/python

Other distros:

You can install the Python libraries listed above with pip:

lxml requires libxml2 and libxslt and their development libs to be installed.

Install pip:

# easy_install pip

And Cloud Scheduler and its dependencies:

# pip install cloud-scheduler

Install without pip

To install without using pip:

Download the zip from github

# wget https://github.com/hep-gc/cloud-scheduler/archive/master.zip
# unzip master.zip
# cd cloud_scheduler
# python setup.py install

Condor Install

Cloud Scheduler works with Condor, which needs
to be installed and able to manage resources. You must install it on the same
machine that runs Cloud Scheduler.

We recommend the following settings, especially if you’re planning on
using Condor CCB:

UPDATE_COLLECTOR_WITH_TCP=True
COLLECTOR_SOCKET_CACHE_SIZE=10000
COLLECTOR.MAX_FILE_DESCRIPTORS = 10000

We have also placed an example Condor config in scripts/condor/manager

Make sure you can run condor_status and condor_q, and make sure your
[HOST]ALLOW_WRITE will permit the VMs you will start to add themselves to your Condor
Pool.

Depending on your clouds and networking it may be required to alter the
TCP_FORWARDING_HOST on any VMs booted to allow condor to connect, there
are scripts that attempt to do this automatically, but they’re imperfect.

Preparing VM Images

The VM images you would like to run jobs with need to be prepared to join your
Condor pool. Cloud Scheduler will do most of the heavy lifting for you, but at
the very least, you need to install Condor, and configure it as a worker that
will join your Condor pool. The easiest way to do this is use the example
configuration (at least as inspiration) from scripts/condor/worker/ . You’ll
want to put these in your /etc/condor directory. You will probably also want to
use our custom Condor init script. This does things like set up an appropriate
environment for when Condor is started with private networking only, when
started on EC2, and also will automatically point your node to your Condor
Pool. When using the custom init script and doing offline testing of the VM
image, ensure you place the central_manager file from scripts/condor/worker into
/etc/condor as the init script will read the value of the CONDOR_HOST from
this file.

Configuration

cloud_scheduler.conf

The Cloud Scheduler configuration file allows you to configure most of its
functionality, and you’ll need to open it up to get a usable installation.
All of its options are described inline in the example configuration file
cloud_scheduler.conf, which is included with Cloud Scheduler.

By default, the Cloud Scheduler setup script installs its configuration files
to /usr/local/share/cloudscheduler it is suggested to copy these to
/etc/cloudscheduler, you can manually select a different configuration
by running cloud_scheduler with the -f option. If you’re running as a non-root
user, Cloud Scheduler will also check for config files in ~/.cloud_scheduler/

Cloud Scheduler checks for config files in the following order, and will use the first one it finds:

[config specified with the -f option]
~/.cloudscheduler/cloud_scheduler.conf
/etc/cloudscheduler/cloud_scheduler.conf
/usr/local/share/cloud-scheduler/cloud_scheduler.conf

There are a few settings that should be modified depending on your system to get up and running:
condor_context_file
condor_host_on_vm
default_yaml

Descriptions of these values are in the cloud_scheduler.conf

cloud init files

Cloud Scheduler has a default cloud config file included with the installation, it should be located
in /usr/local/share/cloud-scheduler/default.yaml if you’ve installed from pip. The location can be set
in the cloud_scheduler.conf file. Additional customization can be done by users by setting an AMIConfig list
of cloud init files along with their jobs.

cloud_resources.conf

The cloud resource configuration file, cloud_resources.conf, is where you
define which clouds Cloud Scheduler should use for starting VMs. You’ll specify
how many VMs you want to boot on each cloud, and what it’s capabilities are.
The best way to get familiar with this file is to open up the sample
cloud_resources.conf file, where all of its configuration options, and a sample
configuration are included.

Like cloud_scheduler.conf, the Cloud Scheduler setup script installs this file
in /etc/cloudscheduler/, but you can manually select a different configuration
by running cloud_scheduler with the -c option. You can also specify the
location of this file with the cloud_resource_config option in the
cloud_scheduler.conf file.

Init Script

There is a cloud scheduler init script at scripts/cloud_scheduler. To install
it on systems with System V style init scripts, you can do so with:

# cp scripts/cloud_scheduler /etc/init.d/

if you’ve installed from pip

# cp /usr/local/share/cloud-scheduler/cloud_scheduler.init.d /etc/init.d/cloud_scheduler
# cp /usr/local/share/cloud-scheduler/cloud_scheduler.sysconf /etc/sysconfig/cloud_scheduler

Start it with:

# /etc/init.d/cloud_scheduler start

On Red Hat-like systems you can enable it to run at boot with:

# chkconfig cloud_scheduler on

NOTE: If you’ve used a non-default Python, you may need to set the PYTHON variable
in the init script. If you’ve installed in a non-default location, you may need to
set your EXECUTABLEPATH variable.

To Stop Cloud Scheduler without it shutting down VMs (Current VMs will be saved
to the persistence file specified in the cloud_scheduler.conf and get reloaded
when Cloud Scheduler is started - Note that loading the VMs from persistence may
take awhile)

# /etc/init.d/cloud_scheduler forcekill

To Reload the cloud_resources.conf and cloud_scheduler.conf with killing VMs

# /etc/init.d/cloud_scheduler quickrestart

Job Submission

Submitting a job for use with Cloud Scheduleris very similar to submitting a
job for use with a regular Condor Scheduler. It would be helpful to read
through Chapter 2 of the Condor Manual for help on submitting jobs to Condor.

Jobs meant to be run by VMs started by Cloud Scheduler need a few extra
parameters to work properly. These are: (Required parameters are highlighted)

  • Requirements = VMType =?= “your.vm.type” : The type of VM that the job must run on. This is a custom attribute of the VM advertised to the Condor central manager. It should be specified on the VM’s condor_config or condor_config.local file.
  • VMAMI : The AMI (for EC2-like clusters) or image name of the image required for the job to run
  • VMCPUCores : The number of CPU cores for the VM. Defaults to 1.
  • VMStorage : The amount of scratch storage space the job requires. (Currently ignored on EC2-like Clusters)
  • VMMem : The amount of RAM that the VM requires.
  • VMNetwork : The network group used for your VM. Only used with OpenStackNative if default network not available.
  • VMInstanceType : The EC2 instance type of the VM requested. Only used with EC2 clouds like Amazon.
  • VMMaximumPrice : The maximum price in cents per hour for a VM (EC2 Only)
  • VMKeepAlive : Number of minutes a VM should stay up after job finishes
  • VMHighPriority : 1 (Optional flag) Indicates a high priority job to Cloud Scheduler – high priority job support can be enabled in the cloud_scheduler.conf
  • TargetClouds : A comma separated list of names of clouds that you would like your job to use
  • CSMyProxyServer : The hostname of the myproxy server you’d like to use for credential renewal
  • CSMyProxyCredsName : The name of your myproxy credentials
  • VMJobPerCore : bool – Assigns multiple slots to a multi-core VM

A Sample Job

# Regular Condor Attributes
Universe   = vanilla
Executable = script.sh
Arguments  = one two three
Log        = script.log
Output     = script.out
Error      = script.error
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
# 
# Cloud Scheduler Attributes
Requirements = VMType =?= "vm.for.script"
+VMLoc         = "http://repository.tld/your.vm.img.gz"
+VMAMI = "ami-dfasfds"
+VMCPUCores    = "1"
+VMNetwork     = "private"
+VMMem         = "512"
+VMStorage     = "20"
Queue

Using Proxy Certificates

For a more secure, but more complicated setup allowing your users to use their
own proxy certificates, there is a guide on the heprc wiki:

https://wiki.heprc.uvic.ca/twiki/bin/view/Main/CsGsiSupport

License

This program is free software; you can redistribute it and/or modify
it under the terms of either:

a) the GNU General Public License as published by the Free
Software Foundation; either version 3, or (at your option) any
later version, or

b) the Apache v2 License.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See either
the GNU General Public License or the Apache v2 License for more details.

You should have received a copy of the Apache v2 License with this
software, in the file named “LICENSE”.

You should also have received a copy of the GNU General Public License
along with this program in the file named “COPYING”. If not, write to the
Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
Boston, MA 02110-1301, USA or visit their web page on the internet at
http://www.gnu.org/copyleft/gpl.html.