Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 57 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,63 @@
STATUS: Beta

# pandasaurus_cxg

Ontology enrichment tool for [CxG standard](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/3.0.0/schema.md) [AnnData files](https://anndata.readthedocs.io/en/latest/).
STATUS: early Beta

A library for retreiving and leveraging the semantic context of ontogy annotation in [CxG standard](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/3.0.0/schema.md) [AnnData files](https://anndata.readthedocs.io/en/latest/).

Slide summarising intended functionality
![image](https://github.com/INCATools/pandasaurus_cxg/assets/112839/3082dcd2-dd2f-469d-9076-4eabcc83130d)

## Installation

Available on [PyPi](https://pypi.org/project/pandasaurus-cxg/0.1.1/)

$ pip3 install pandasaurus_cxg

## Usage

The `AnndataEnricher` and `AnndataAnalyzer` classes can be used both individually and in conjunction with the `AnndataEnrichmentAnalyzer` wrapper class. The `AnndataEnrichmentAnalyzer` class serves as a convenient way to leverage the functionalities of both `AnndataEnricher` and `AnndataAnalyzer`.

### Using AnndataEnricher and AnndataAnalyzer Individually

You can use the `AnndataEnricher` and `AnndataAnalyzer` classes separately to perform specific tasks on your data. For instance, `AnndataEnricher` facilitates data enrichment, while `AnndataAnalyzer` provides various analysis tools for Anndata objects.

```python
from pandasaurus_cxg.anndata_enricher import AnndataEnricher
ade = AnndataEnricher.from_file_path("test/data/modified_human_kidney.h5ad")
ade.simple_enrichment()
ade.minimal_slim_enrichment(["blood_and_immune_upper_slim"])
```

```python
from pandasaurus_cxg.anndata_analyzer import AnndataAnalyzer
ada = AnndataAnalyzer.from_file_path("./immune_example.h5ad", author_cell_type_list = ['subclass.full', 'subclass.l3', 'subclass.l2', 'subclass.l1', 'class', 'author_cell_type'])
ada.co_annotation_report()
```

### Using AnndataEnrichmentAnalyzer Wrapper

The AnndataEnrichmentAnalyzer class wraps the functionality of both AnndataEnricher and AnndataAnalyzer, offering a seamless way to perform enrichment and analysis in one go.

```python
from pandasaurus_cxg.enrichment_analysis import AnndataEnrichmentAnalyzer
from pandasaurus_cxg.graph_generator.graph_generator import GraphGenerator
aea = AnndataEnrichmentAnalyzer("test/data/modified_human_kidney.h5ad")
aea.contextual_slim_enrichment()
aea.co_annotation_report()
gg = GraphGenerator(aea)
gg.generate_rdf_graph()
gg.set_label_adding_priority(["class", "cell_type", "subclass.l1", "subclass.l1", "subclass.full", "subclass.l2", "subclass.l3"])
gg.add_label_to_terms()
gg.enrich_rdf_graph()
gg.save_rdf_graph(file_name="kidney_new", _format="ttl")
```
More examples and detailed explanation can be found in jupyter notebook given in [Snippets](#Snippets)

## Snippets

https://github.com/INCATools/pandasaurus_cxg/blob/roadmap/walkthrough.ipynb

## Roadmap

https://github.com/INCATools/pandasaurus_cxg/blob/roadmap/ROADMAP.md

52 changes: 52 additions & 0 deletions ROADMAP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
## Pandasaurus_cxg Roadmap

* Generate & release integrated doc from PyDoc - including links to Tutorial notebooks

(potential framework - Sphinx)

* Testing:
* Test against a range of datasets on CxG to find bugs and performance issues
* User testing - recruit friendly bioinformaticians to give feedback on functionality and usability

* Extend basic enrichment methods to include number of hops from term.

* Add support for CxG schema validation (via dependency on official lib)

This may not be needed for files downloaded from CxG, but aim is in part to promote the standard more generally so aims to be ready for files from other sources.

* Add semantic context queries

(Dependency - add abstracted most-specific subject/object queries to pandasaurus)
* CL-Pro
* CL-GO & GO-CL
* HPO-CL & MP-CL
* MONDO-CL
* OBA-CL

* Add interface to QuickGO to pull gene associations.

Can we use an existing lib for this?

* Add interface to Monarch API to pull gene associations for Mondo, HP, MP, OBA.

Can we use an existing lib for this or collaborate with Monarch on one?

* Add support for queries for gene sets and general classes from disease metadata term.

* Extend support for filtering on metadata before analysis

* Add library of author cell type fields for CxG hosted datasets where this has been curated

* Add support for cell type annotation schema (CAP)


## Potential future functionality

Both of these are probably better served by workflows with existing libraries

- Automatic Cross checking retrieved gene sets against cluster expression
- interfacing with standard enrichment tools




4 changes: 2 additions & 2 deletions pandasaurus_cxg/anndata_analyzer.py
Original file line number Diff line number Diff line change
Expand Up @@ -192,8 +192,8 @@ def _remove_duplicates(data: List[List[str]]):
@staticmethod
def _assign_predicate_column(co_oc, field_name_1, field_name_2):
# Group by field_name_2 and field_name_1 to create dictionaries
field_name_2_dict = co_oc.groupby(field_name_2)[field_name_1].apply(list).to_dict()
field_name_1_dict = co_oc.groupby(field_name_1)[field_name_2].apply(list).to_dict()
field_name_2_dict = co_oc.groupby(field_name_2, observed=True)[field_name_1].apply(list).to_dict()
field_name_1_dict = co_oc.groupby(field_name_1, observed=True)[field_name_2].apply(list).to_dict()
# Assign the "predicate" column using self._assign_predicate method
co_oc["predicate"] = co_oc.apply(
AnndataAnalyzer._assign_predicate,
Expand Down
Loading