This is the Documentation and Status Tracking Repository for the Open Health Natural Language Processing Toolkit. Please refer to individual component repositories for relevant source code (linked below)
- Backbone: OHNLPTK's pipeline execution engine based on Apache Beam allowing for JSON-configurable, modular, plug-and-play execution of data transformation pipelines at scale (including support for popular frameworks such as Apache Spark, Apache Flink, GCP Dataflow, and Azure Databricks). Includes built in adapters for a variety of data sources
- BackboneConfigurator: User Interface for Editing and Configuring Backbone Pipelines
- MedTagger: NLP for General Clinical Information Extraction Tasks as part of Backbone Pipelines
- MedXN: An extension of MedTagger specifically tuned for drug extraction
- PresidioDeidentificationforOHNLPTK: Wraps Microsoft's Presidio SDK to allow for De-identification and Synthetic Replacement of Clinical Text as part of Backbone Pipelines using any trained Huggingface-hub-format-compatible PII recognizer BERT-based model
- backbone API: Java API for Backbone. Use this as a basis for implementing your own java-based Backbone pipeline components. Also contains code for java-python bridge implementation to allow for mixing languages amongst different components
- backbone-xlang-python: Python API for Backbone. Use this as a basis for implementing your own python-based Backbone pipeline components
- ohnlptk-ml: Various machine learning API extensions for Backbone. Extend this to implement federated learning on BYO pytorch models using FedAVG as part of Backbone pipelines
- Setup Script: Shell Script that will install all the configuration and base scripts for both local and cloud deployments. Requires internet access for component download and update checks.
- Docker Images: Docker Images for Various OHNLP Toolkit Pipelines. Intended for local (non-cloud) evaluation installs on small datasets only on systems that do not have internet access/secure environments.
- Demonstration Website: Contains a (now partially defunct) demonstration website for the OHNLP toolkit. This repository is not kept up to date and is not compatible with the latest OHNLP Toolkit features. A replacement demonstration website/code rewrite is in progress that supports more generalized features beyond N3C-related phenotypes.