Automatically boot VMs for your HTC jobs
Cloud Scheduler: Automatically boot VMs for your HTC jobs
Cloud Scheduler manages virtual machines on clouds configured with OpenStack,
Google Compute Engine, or Amazon EC2 to create an environment for HTC batch job execution.
Users submit their jobs to a Condor job queue, and Cloud Scheduler boots VMs to
suit those jobs.
For more documentation on Cloud Scheduler, please refer to:
# pip install cloud-scheduler
Since Cloud Scheduler requires Python 2.6+, and we recognize that RHEL 5 comes
with and requires Python 2.4, here’s a quick guide to getting Python
installed on those systems:
Python 2.6 may be in the repos depending on your version(5.5+):
$ yum install python26 python26-distribute
For Python 2.7:
Install the tools we need to build Python and its modules:
# yum install gcc gdbm-devel readline-devel ncurses-devel zlib-devel \
bzip2-devel sqlite-devel db4-devel openssl-devel tk-devel \
bluez-libs-devel libxslt libxslt-devel libxml2-devel libxml2
Download and compile Python 2.7.1:
$ VERSION=2.7.1
$ mkdir /tmp/src
$ cd /tmp/src/
$ wget http://python.org/ftp/python/$VERSION/Python-$VERSION.tar.bz2
$ tar xjf Python-$VERSION.tar.bz2
$ rm Python-$VERSION.tar.bz2
$ cd Python-$VERSION
$ ./configure
$ make
$ sudo make altinstall
Now we need to install Python setuputils:
$ cd /tmp/src
$ wget http://pypi.python.org/packages/2.7/s/setuptools/setuptools-0.6c11-py2.7.egg
$ sudo sh setuptools-0.6c11-py2.7.egg
Now install pip to install the rest of our dependencies:
$ sudo easy_install-2.7 pip
And the rest of our dependencies:
$ sudo pip-2.7 install cloud-scheduler
Now clean everything up:
$ sudo rm -Rf /tmp/src/
Finally, once you’ve set up the rest of Cloud Scheduler, you’ll want to set
your Python version in the Cloud Scheduler init script, or use virtualenv.
Do this by changing the PYTHON variable to /opt/bin/python
You can install the Python libraries listed above with pip:
lxml requires libxml2 and libxslt and their development libs to be installed.
Install pip:
# easy_install pip
And Cloud Scheduler and its dependencies:
# pip install cloud-scheduler
To install without using pip:
Download the zip from github
# wget https://github.com/hep-gc/cloud-scheduler/archive/master.zip
# unzip master.zip
# cd cloud_scheduler
# python setup.py install
Cloud Scheduler works with Condor, which needs
to be installed and able to manage resources. You must install it on the same
machine that runs Cloud Scheduler.
We recommend the following settings, especially if you’re planning on
using Condor CCB:
UPDATE_COLLECTOR_WITH_TCP=True
COLLECTOR_SOCKET_CACHE_SIZE=10000
COLLECTOR.MAX_FILE_DESCRIPTORS = 10000
We have also placed an example Condor config in scripts/condor/manager
Make sure you can run condor_status and condor_q, and make sure your
[HOST]ALLOW_WRITE will permit the VMs you will start to add themselves to your Condor
Pool.
Depending on your clouds and networking it may be required to alter the
TCP_FORWARDING_HOST on any VMs booted to allow condor to connect, there
are scripts that attempt to do this automatically, but they’re imperfect.
The VM images you would like to run jobs with need to be prepared to join your
Condor pool. Cloud Scheduler will do most of the heavy lifting for you, but at
the very least, you need to install Condor, and configure it as a worker that
will join your Condor pool. The easiest way to do this is use the example
configuration (at least as inspiration) from scripts/condor/worker/ . You’ll
want to put these in your /etc/condor directory. You will probably also want to
use our custom Condor init script. This does things like set up an appropriate
environment for when Condor is started with private networking only, when
started on EC2, and also will automatically point your node to your Condor
Pool. When using the custom init script and doing offline testing of the VM
image, ensure you place the central_manager file from scripts/condor/worker into
/etc/condor as the init script will read the value of the CONDOR_HOST from
this file.
The Cloud Scheduler configuration file allows you to configure most of its
functionality, and you’ll need to open it up to get a usable installation.
All of its options are described inline in the example configuration file
cloud_scheduler.conf, which is included with Cloud Scheduler.
By default, the Cloud Scheduler setup script installs its configuration files
to /usr/local/share/cloudscheduler it is suggested to copy these to
/etc/cloudscheduler, you can manually select a different configuration
by running cloud_scheduler with the -f option. If you’re running as a non-root
user, Cloud Scheduler will also check for config files in ~/.cloud_scheduler/
Cloud Scheduler checks for config files in the following order, and will use the first one it finds:
[config specified with the -f option]
~/.cloudscheduler/cloud_scheduler.conf
/etc/cloudscheduler/cloud_scheduler.conf
/usr/local/share/cloud-scheduler/cloud_scheduler.conf
There are a few settings that should be modified depending on your system to get up and running:
condor_context_file
condor_host_on_vm
default_yaml
Descriptions of these values are in the cloud_scheduler.conf
Cloud Scheduler has a default cloud config file included with the installation, it should be located
in /usr/local/share/cloud-scheduler/default.yaml if you’ve installed from pip. The location can be set
in the cloud_scheduler.conf file. Additional customization can be done by users by setting an AMIConfig list
of cloud init files along with their jobs.
The cloud resource configuration file, cloud_resources.conf, is where you
define which clouds Cloud Scheduler should use for starting VMs. You’ll specify
how many VMs you want to boot on each cloud, and what it’s capabilities are.
The best way to get familiar with this file is to open up the sample
cloud_resources.conf file, where all of its configuration options, and a sample
configuration are included.
Like cloud_scheduler.conf, the Cloud Scheduler setup script installs this file
in /etc/cloudscheduler/, but you can manually select a different configuration
by running cloud_scheduler with the -c option. You can also specify the
location of this file with the cloud_resource_config option in the
cloud_scheduler.conf file.
There is a cloud scheduler init script at scripts/cloud_scheduler. To install
it on systems with System V style init scripts, you can do so with:
# cp scripts/cloud_scheduler /etc/init.d/
if you’ve installed from pip
# cp /usr/local/share/cloud-scheduler/cloud_scheduler.init.d /etc/init.d/cloud_scheduler
# cp /usr/local/share/cloud-scheduler/cloud_scheduler.sysconf /etc/sysconfig/cloud_scheduler
Start it with:
# /etc/init.d/cloud_scheduler start
On Red Hat-like systems you can enable it to run at boot with:
# chkconfig cloud_scheduler on
NOTE: If you’ve used a non-default Python, you may need to set the PYTHON variable
in the init script. If you’ve installed in a non-default location, you may need to
set your EXECUTABLEPATH variable.
To Stop Cloud Scheduler without it shutting down VMs (Current VMs will be saved
to the persistence file specified in the cloud_scheduler.conf and get reloaded
when Cloud Scheduler is started - Note that loading the VMs from persistence may
take awhile)
# /etc/init.d/cloud_scheduler forcekill
To Reload the cloud_resources.conf and cloud_scheduler.conf with killing VMs
# /etc/init.d/cloud_scheduler quickrestart
Submitting a job for use with Cloud Scheduleris very similar to submitting a
job for use with a regular Condor Scheduler. It would be helpful to read
through Chapter 2 of the Condor Manual for help on submitting jobs to Condor.
Jobs meant to be run by VMs started by Cloud Scheduler need a few extra
parameters to work properly. These are: (Required parameters are highlighted)
# Regular Condor Attributes
Universe = vanilla
Executable = script.sh
Arguments = one two three
Log = script.log
Output = script.out
Error = script.error
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
#
# Cloud Scheduler Attributes
Requirements = VMType =?= "vm.for.script"
+VMLoc = "http://repository.tld/your.vm.img.gz"
+VMAMI = "ami-dfasfds"
+VMCPUCores = "1"
+VMNetwork = "private"
+VMMem = "512"
+VMStorage = "20"
Queue
For a more secure, but more complicated setup allowing your users to use their
own proxy certificates, there is a guide on the heprc wiki:
https://wiki.heprc.uvic.ca/twiki/bin/view/Main/CsGsiSupport
This program is free software; you can redistribute it and/or modify
it under the terms of either:
a) the GNU General Public License as published by the Free
Software Foundation; either version 3, or (at your option) any
later version, or
b) the Apache v2 License.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See either
the GNU General Public License or the Apache v2 License for more details.
You should have received a copy of the Apache v2 License with this
software, in the file named “LICENSE”.
You should also have received a copy of the GNU General Public License
along with this program in the file named “COPYING”. If not, write to the
Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
Boston, MA 02110-1301, USA or visit their web page on the internet at
http://www.gnu.org/copyleft/gpl.html.