Skip to content

Master Repository Tracking the Various Components of the Open Health Natural Language Processing Toolkit

Notifications You must be signed in to change notification settings

OHNLP/ohnlptk_parent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

The Open Health Natural Language Processing Toolkit (OHNLPTK)

This is the Documentation and Status Tracking Repository for the Open Health Natural Language Processing Toolkit. Please refer to individual component repositories for relevant source code (linked below)

OHNLPTK Components (for End Users)

  • Backbone: OHNLPTK's pipeline execution engine based on Apache Beam allowing for JSON-configurable, modular, plug-and-play execution of data transformation pipelines at scale (including support for popular frameworks such as Apache Spark, Apache Flink, GCP Dataflow, and Azure Databricks). Includes built in adapters for a variety of data sources
  • BackboneConfigurator: User Interface for Editing and Configuring Backbone Pipelines
  • MedTagger: NLP for General Clinical Information Extraction Tasks as part of Backbone Pipelines
  • MedXN: An extension of MedTagger specifically tuned for drug extraction
  • PresidioDeidentificationforOHNLPTK: Wraps Microsoft's Presidio SDK to allow for De-identification and Synthetic Replacement of Clinical Text as part of Backbone Pipelines using any trained Huggingface-hub-format-compatible PII recognizer BERT-based model

OHNLPTK Components (for Developers)

  • backbone API: Java API for Backbone. Use this as a basis for implementing your own java-based Backbone pipeline components. Also contains code for java-python bridge implementation to allow for mixing languages amongst different components
  • backbone-xlang-python: Python API for Backbone. Use this as a basis for implementing your own python-based Backbone pipeline components
  • ohnlptk-ml: Various machine learning API extensions for Backbone. Extend this to implement federated learning on BYO pytorch models using FedAVG as part of Backbone pipelines

Installation and Deployment

  • Setup Script: Shell Script that will install all the configuration and base scripts for both local and cloud deployments. Requires internet access for component download and update checks.
  • Docker Images: Docker Images for Various OHNLP Toolkit Pipelines. Intended for local (non-cloud) evaluation installs on small datasets only on systems that do not have internet access/secure environments.

Legacy

  • Demonstration Website: Contains a (now partially defunct) demonstration website for the OHNLP toolkit. This repository is not kept up to date and is not compatible with the latest OHNLP Toolkit features. A replacement demonstration website/code rewrite is in progress that supports more generalized features beyond N3C-related phenotypes.

About

Master Repository Tracking the Various Components of the Open Health Natural Language Processing Toolkit

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published