Detect interactions between chemical compounds/drugs and genes/proteins using Spark NLP's `RelationExtractionModel()` by classifying whether a specified semantic relation holds between a chemical and gene entities within a sentence or document. The entity labels used during training were derived from the [custom NER model](https://nlp.johnsnowlabs.com/2021/12/20/ner_drugprot_clinical_en.html) created by our team for the [DrugProt corpus](https://zenodo.org/record/5119892). These include `CHEMICAL` for chemical compounds/drugs, `GENE` for genes/proteins and `GENE_AND_CHEMICAL` for entity mentions of type `GENE` and of type `CHEMICAL` that overlap (such as enzymes and small peptides). The relation categories from the [DrugProt corpus](https://zenodo.org/record/5119892) were condensed from 13 categories to 10 categories due to low numbers of examples for certain categories. This merging process involved grouping the `SUBSTRATE_PRODUCT-OF` and `SUBSTRATE` relation categories together and grouping the `AGONIST-ACTIVATOR`, `AGONIST-INHIBITOR` and `AGONIST` relation categories together.
0 commit comments