trove

Deploy machine learning models in Ruby (and Rails)

38
2
Ruby

Trove

🔥 Deploy machine learning models in Ruby (and Rails)

Works great with XGBoost, Torch.rb, fastText, and many other gems

Installation

Add this line to your application’s Gemfile:

gem "trove"

And run:

bundle install
trove init

And configure your storage in .trove.yml.

Storage

Amazon S3

Create a bucket and enable object versioning.

Next, set up your AWS credentials. You can use the AWS CLI:

pip install awscli
aws configure

Or environment variables:

export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=...

IAM users need:

  • s3:GetObject and s3:GetObjectVersion to pull files
  • s3:PutObject to push files
  • s3:ListBucket and s3:ListBucketVersions to list files and versions
  • s3:DeleteObject and s3:DeleteObjectVersion to delete files

Here’s an example policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Trove",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:PutObject",
                "s3:ListBucket",
                "s3:ListBucketVersions",
                "s3:DeleteObject",
                "s3:DeleteObjectVersion"
            ],
            "Resource": [
                "arn:aws:s3:::my-bucket",
                "arn:aws:s3:::my-bucket/trove/*"
            ]
        }
    ]
}

If your production servers only need to pull files, only give them s3:GetObject and s3:GetObjectVersion permissions.

How It Works

Git is great for code, but it’s not ideal for large files like models. Instead, we use an object store like Amazon S3 to store and version them.

Trove creates a trove directory for you to use as a workspace. Files in this directory are ignored by Git but can be pushed and pulled from the object store. By default, files are tracked in .trove.yml to make it easy to deploy specific versions with code changes.

Getting Started

Use the trove directory to save and load models.

# training code
model.save_model("trove/model.bin")

# prediction code
model = FastText.load_model("trove/model.bin")

When a model is ready, push it to the object store with:

trove push model.bin

And commit the changes to .trove.yml. The model is now ready to be deployed.

Deployment

We recommend pulling files during the build process.

Make sure your storage credentials are available in the build environment.

Heroku and Dokku

Add to your Rakefile:

Rake::Task["assets:precompile"].enhance do
  Trove.pull
end

This will pull files at the very end of the asset precompile. Check the build output for:

remote:        Pulling model.bin...
remote:        Asset precompilation completed (30.00s)

Docker

Add to your Dockerfile:

RUN bundle exec trove pull

Commands

Push a file

trove push model.bin

Pull all files in .trove.yml

trove pull

Pull a specific file (uses the version in .trove.yml if present)

trove pull model.bin

Pull a specific version of a file

trove pull model.bin --version 123

Delete a file

trove delete model.bin

List files

trove list

List versions

trove versions model.bin

Ruby API

You can use the Ruby API in addition to the CLI.

Trove.push(filename)
Trove.pull
Trove.pull(filename)
Trove.pull(filename, version: version)
Trove.delete(filename)
Trove.list
Trove.versions(filename)

This makes it easy to perform operations from code, iRuby notebooks, and the Rails console.

Automated Training

By default, Trove tracks files in .trove.yml to make it easy to deploy specific versions with code changes. However, this functionality is entirely optional. Disable it with:

vcs: false

This is useful if you want to automate training or build more complex workflows.

Non-Ruby

Trove can be used in non-Ruby projects as well.

gem install trove
trove init

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/trove.git
cd trove
bundle install

export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=...
export S3_BUCKET=my-bucket

bundle exec rake test