INCATools · ubyndr · Sep 1, 2023 · Aug 31, 2023 · Aug 31, 2023 · Aug 31, 2023
diff --git a/README.md b/README.md
@@ -1,9 +1,63 @@
-STATUS: Beta
-
 # pandasaurus_cxg
 
-Ontology enrichment tool for [CxG standard](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/3.0.0/schema.md) [AnnData files](https://anndata.readthedocs.io/en/latest/).
+STATUS: early Beta
+
+A library for retreiving and leveraging the semantic context of ontogy annotation in [CxG standard](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/3.0.0/schema.md) [AnnData files](https://anndata.readthedocs.io/en/latest/).
 
 Slide summarising intended functionality
 ![image](https://github.com/INCATools/pandasaurus_cxg/assets/112839/3082dcd2-dd2f-469d-9076-4eabcc83130d)
 
+## Installation
+
+Available on [PyPi](https://pypi.org/project/pandasaurus-cxg/0.1.1/)
+
+$ pip3 install pandasaurus_cxg
+
+## Usage
+
+The `AnndataEnricher` and `AnndataAnalyzer` classes can be used both individually and in conjunction with the `AnndataEnrichmentAnalyzer` wrapper class. The `AnndataEnrichmentAnalyzer` class serves as a convenient way to leverage the functionalities of both `AnndataEnricher` and `AnndataAnalyzer`.
+
+### Using AnndataEnricher and AnndataAnalyzer Individually
+
+You can use the `AnndataEnricher` and `AnndataAnalyzer` classes separately to perform specific tasks on your data. For instance, `AnndataEnricher` facilitates data enrichment, while `AnndataAnalyzer` provides various analysis tools for Anndata objects.
+
+```python
+from pandasaurus_cxg.anndata_enricher import AnndataEnricher
+ade = AnndataEnricher.from_file_path("test/data/modified_human_kidney.h5ad")
+ade.simple_enrichment()
+ade.minimal_slim_enrichment(["blood_and_immune_upper_slim"])
+```
+
+```python
+from pandasaurus_cxg.anndata_analyzer import AnndataAnalyzer
+ada = AnndataAnalyzer.from_file_path("./immune_example.h5ad", author_cell_type_list = ['subclass.full', 'subclass.l3', 'subclass.l2', 'subclass.l1', 'class', 'author_cell_type'])
+ada.co_annotation_report()
+```
+
+### Using AnndataEnrichmentAnalyzer Wrapper
+
+The AnndataEnrichmentAnalyzer class wraps the functionality of both AnndataEnricher and AnndataAnalyzer, offering a seamless way to perform enrichment and analysis in one go.
+
+```python
+from pandasaurus_cxg.enrichment_analysis import AnndataEnrichmentAnalyzer
+from pandasaurus_cxg.graph_generator.graph_generator import GraphGenerator
+aea = AnndataEnrichmentAnalyzer("test/data/modified_human_kidney.h5ad")
+aea.contextual_slim_enrichment()
+aea.co_annotation_report()
+gg = GraphGenerator(aea)
+gg.generate_rdf_graph()
+gg.set_label_adding_priority(["class", "cell_type", "subclass.l1", "subclass.l1", "subclass.full", "subclass.l2", "subclass.l3"])
+gg.add_label_to_terms()
+gg.enrich_rdf_graph()
+gg.save_rdf_graph(file_name="kidney_new", _format="ttl")
+```
+More examples and detailed explanation can be found in jupyter notebook given in [Snippets](#Snippets)
+
+## Snippets
+
+https://github.com/INCATools/pandasaurus_cxg/blob/roadmap/walkthrough.ipynb
+
+## Roadmap
+
+https://github.com/INCATools/pandasaurus_cxg/blob/roadmap/ROADMAP.md
+
diff --git a/ROADMAP.md b/ROADMAP.md
@@ -0,0 +1,52 @@
+## Pandasaurus_cxg Roadmap
+
+* Generate & release integrated doc from PyDoc - including links to Tutorial notebooks
+
+    (potential framework - Sphinx)
+
+* Testing:
+   * Test against a range of datasets on CxG to find bugs and performance issues
+   * User testing - recruit friendly bioinformaticians to give feedback on functionality and usability
+
+* Extend basic enrichment methods to include number of hops from term.
+
+* Add support for CxG schema validation (via dependency on official lib)
+
+  This may not be needed for files downloaded from CxG, but aim is in part to promote the standard more generally so aims to be ready for files from other sources.
+
+* Add semantic context queries
+
+  (Dependency - add abstracted most-specific subject/object queries to pandasaurus)
+  * CL-Pro
+  * CL-GO & GO-CL
+  * HPO-CL & MP-CL
+  * MONDO-CL
+  * OBA-CL
+
+* Add interface to QuickGO to pull gene associations.
+
+  Can we use an existing lib for this?
+
+* Add interface to Monarch API to pull gene associations for Mondo, HP, MP, OBA.
+
+  Can we use an existing lib for this or collaborate with Monarch on one?
+
+* Add support for queries for gene sets and general classes from disease metadata term.
+
+* Extend support for filtering on metadata before analysis
+
+* Add library of author cell type fields for CxG hosted datasets where this has been curated
+
+* Add support for cell type annotation schema (CAP)
+
+
+## Potential future functionality
+
+Both of these are probably better served by workflows with existing libraries
+
+- Automatic Cross checking retrieved gene sets against cluster expression
+- interfacing with standard enrichment tools 
+
+
+
+
diff --git a/pandasaurus_cxg/anndata_analyzer.py b/pandasaurus_cxg/anndata_analyzer.py
@@ -192,8 +192,8 @@ def _remove_duplicates(data: List[List[str]]):
     @staticmethod
     def _assign_predicate_column(co_oc, field_name_1, field_name_2):
         # Group by field_name_2 and field_name_1 to create dictionaries
-        field_name_2_dict = co_oc.groupby(field_name_2)[field_name_1].apply(list).to_dict()
-        field_name_1_dict = co_oc.groupby(field_name_1)[field_name_2].apply(list).to_dict()
+        field_name_2_dict = co_oc.groupby(field_name_2, observed=True)[field_name_1].apply(list).to_dict()
+        field_name_1_dict = co_oc.groupby(field_name_1, observed=True)[field_name_2].apply(list).to_dict()
         # Assign the "predicate" column using self._assign_predicate method
         co_oc["predicate"] = co_oc.apply(
             AnndataAnalyzer._assign_predicate,