Apache cTAKES is a Natural Language Processing (NLP) platform for clinical text.
The Apache™ clinical Text Analysis and Knowledge Extraction System (cTAKES™) focuses on extracting knowledge
from clinical text through Natural Language Processing (NLP) techniques.
cTAKES is engineered in a modular fashion and employs leading-edge rule-based and machine learning methods.
cTAKES has standard features for biomedical text processing software,
including the ability to extract concepts such as symptoms, procedures, diagnoses, medications and anatomy
with attributes and standard codes.
More powerful components can perform tasks as complex as identifying temporal events,
dates and times – resulting in placement of events in a patient timeline.
Components are trained on gold standards from the biomedical as well as the general domain.
This affords usability across different types of clinical narrative (e.g. radiology reports,
clinical notes, discharge summaries) in various institution formats as well as other types of
health-related narrative (e.g. twitter feeds), using multiple data standards (e.g. Health Level 7 (HL7),
Clinical Document Architecture (CDA), Fast Healthcare Interoperability Resources (FHIR), SNOMED-CT, RxNORM).
cTAKES is the NLP platform for many initiatives across the world covering a variety of research purposes
and large datasets.
Contributors include professionals at medical and commercial institutions, NLP and Machine Learning researchers,
Medical Doctors, and students of many disciplines and levels.
We encourage people from all backgrounds to get involved! (link)
$ java -version
$ mvn -version
$ python -V
The easiest way for new users to get a jump start running cTAKES is to use the Standard Pipeline Installation Facility.
The Standard Pipeline Installation Facility is a tool that can install cTAKES configured to run the most popular cTAKES pre-built pipelines.
You can then use the Piper File Submitter GUI to submit jobs or submit them from the command line.
For access to all cTAKES capabilities, download a zip or tar.z file containing a fully-built installation of the most recent cTAKES release.
Then, after obtaining a UMLS license, use the UMLS Package Fetcher GUI to install a copy of the
default dictionary for Named Entity Recognition (NER) using cTAKES Fast Dictionary Lookup.
Notice: cTAKES 7.0.0-SNAPSHOT requires jdk 17 to build and run.
All source code for cTAKES versions 5+ is available from the cTAKES GitHub repository.
$ git clone https://github.com/apache/ctakes.git
Much more information can be found on the cTAKES wiki.
You can also write to the cTAKES user and developer mailing lists: user at ctakes.apache.org and dev at apache.ctakes.org
and find answers to previously asked questions by searching the user
and developer mail archives.