declarative cluster management

Declarative cluster management using constraint programming, where constraints are described using SQL.

88
11
Java

License
Build Status
codecov
Maven Central
GitHub release (latest by date)
javadoc

Declarative Cluster Management

  1. Overview
  2. Download
  3. Pre-requisites for use
  4. Quick Start
  5. Documentation
  6. Contributing
  7. Information for developers
  8. Learn more

Overview

Modern cluster management systems like Kubernetes routinely grapple
with hard combinatorial optimization problems: load balancing,
placement, scheduling, and configuration. Implementing application-specific algorithms to
solve these problems is notoriously hard to do, making it challenging to evolve the system over time
and add new features.

DCM is a tool to overcome this challenge. It enables programmers to build schedulers
and cluster managers using a high-level declarative language (SQL).

Specifically, developers need to represent cluster state in an SQL database, and write constraints
and policies that should apply on that state using SQL. From the SQL specification, the DCM compiler synthesizes a
program that at runtime, can be invoked to compute policy-compliant cluster management decisions given the latest
cluster state. Under the covers, the generated program efficiently encodes the cluster state as an
optimization problem that can be solved using off-the-shelf solvers, freeing developers from having to
design ad-hoc heuristics.

The high-level architecture is shown in the diagram below.

Download

The DCM project’s groupId is com.vmware.dcm and its artifactId is dcm.
We make DCM’s artifacts available through Maven Central.

To use DCM from a Maven-based project, use the following dependency:

<dependency>
    <groupId>com.vmware.dcm</groupId>
    <artifactId>dcm</artifactId>
    <version>0.15.0</version>
</dependency>

To use within a Gradle-based project:

implementation 'com.vmware.dcm:dcm:0.15.0'

Pre-requisites for use

  1. We test regularly on JDK 11 and 16.

  2. We test regularly on OSX and Ubuntu 20.04.

  3. We currently support two solver backends.

    • Google OR-tools CP-SAT (version 9.1.9490). This is available by default when using the maven dependency.

    • MiniZinc (version 2.3.2). This backend is currently being deprecated. If you still want to use it
      in your project, or if you want run all tests in this repository, you will have to install MiniZinc out-of-band.

      To do so, download MiniZinc from https://www.minizinc.org/software.html
      … and make sure you are able to invoke the minizinc binary from your commandline.

Quick start

Here is a complete program
that you can run to get a feel for DCM.

import com.vmware.dcm.Model;
import org.jooq.DSLContext;
import org.jooq.impl.DSL;
import org.junit.jupiter.api.Test;

import java.util.List;

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertTrue;

public class QuickStartTest {

    @Test
    public void quickStart() {
        // Create an in-memory database and get a JOOQ connection to it
        final DSLContext conn = DSL.using("jdbc:h2:mem:");

        // A table representing some machines
        conn.execute("create table machines(id integer)");

        // A table representing tasks, that need to be assigned to machines by DCM.
        // To do so, create a variable column (prefixed by controllable__).
        conn.execute("create table tasks(task_id integer, controllable__worker_id integer, " +
                "foreign key (controllable__worker_id) references machines(id))");

        // Add four machines
        conn.execute("insert into machines values(1)");
        conn.execute("insert into machines values(3)");
        conn.execute("insert into machines values(5)");
        conn.execute("insert into machines values(8)");

        // Add two tasks
        conn.execute("insert into tasks values(1, null)");
        conn.execute("insert into tasks values(2, null)");

        // Time to specify a constraint! Just for fun, let's assign tasks to machines such that
        // the machine IDs sum up to 6.
        final String constraint = "create constraint example_constraint as " +
                "select * from tasks check sum(controllable__worker_id) = 6";

        // Create a DCM model using the database connection and the above constraint
        final Model model = Model.build(conn, List.of(constraint));

        // Solve and return the tasks table. The controllable__worker_id column will either be [1, 5] or [5, 1]
        final List<Integer> column = model.solve("TASKS")
                .map(e -> e.get("CONTROLLABLE__WORKER_ID", Integer.class));
        assertEquals(2, column.size());
        assertTrue(column.contains(1));
        assertTrue(column.contains(5));
    }
}

Documentation

The Model class serves as DCM’s public API. It exposes
two methods: Model.build() and model.solve().

  • Check out the tutorial to learn how to use DCM by building a simple VM load balancer
  • Check out our research papers for the back story behind DCM
  • The Model API Javadocs

Contributing

We welcome all feedback and contributions! ❤️

Please use Github issues for user questions
and bug reports.

Check out the contributing guide if you’d like to send us a pull request.

Information for developers

The entire build including unit tests can be triggered from the root folder with the following command (make
sure to setup both solvers first):

$: ./gradlew build

To avoid documentation drift, code snippets in a documentation file (like the README or tutorial)
are embedded directly from source files that are continuously tested. To refresh these documentation
files:

$: npx embedme <file>

The Kubernetes scheduler also comes with integration tests that run against a real Kubernetes cluster.
It goes without saying that you should not point to a production cluster as these tests repeatedly delete all
running pods and deployments
. To run these integration-tests, make sure you have a valid KUBECONFIG
environment variable that points to a Kubernetes cluster.

We recommend setting up a local multi-node cluster and a corresponding KUBECONFIG using
kind. Once you’ve installed kind, run the following
to create a test cluster:

 $: kind create cluster --config k8s-scheduler/src/test/resources/kind-test-cluster-configuration.yaml --name dcm-it

The above step will create a configuration file in your home folder (~/.kube/kind-config-dcm-it), make sure
you initialize a KUBECONFIG environment variable to point to that path.

You can then execute the following command to run integration-tests against the created local cluster:

$: KUBECONFIG=~/.kube/kind-config-dcm-it ./gradlew :k8s-scheduler:integrationTest

To run a specific integration test class (example: SchedulerIT from the k8s-scheduler module):

$: KUBECONFIG=~/.kube/kind-config-dcm-it ./gradlew :k8s-scheduler:integrationTest --tests SchedulerIT

Learn more

To learn more about DCM, we suggest going through the following references:

Talks:

Research papers: