Real-time processing engine that extracts domain concepts from documents and inserts those concepts into a knowledge graph



Build Status

master: [Build Status]
dev: [Build Status]
de-storm: [Build Status]

Running Topology


  • RabbitMQ running on the default port 5672

  • document-service running on the default port 8118

  • Rexster/Titan/query-service running

  • Supervisord installed via:

      pip install supervisor --pre


      easy_install supervisor

Compile and Execute Locally

  1. Open a terminal and cd to the rt directory

  2. Run the following commands:

     supervisord -c supervisord.conf

Eclipse Development

  1. Install eGit plugin
  2. Install Maven Integration for Eclipse (m2e)
  3. Use the Eclipse Git Repository Exploring perspective
  4. Click the Clone a Git Repository
  5. Use the URI
  6. Right click the repository and select Import Projects...
  7. Select Import Existing Projects and the pom.xml should be shown

To Do

  1. Implement configuration with etcd, defaulting to local config.yaml if etcd is not running
  2. Add the RMQ message’s timestamp to the subgraph that is passed to alignment
  3. Implement doc-service-java-client method to fetch extracted text from doc-service, and use this method in the UnstructuredTransformer