Skip to content

Conversation

jkgoodrich
Copy link
Contributor

Adds functions apply_sklearn_classification_model, apply_onnx_classification_model and convert_sklearn_rf_to_onnx to support the use of ONNX models as suggested by a user in #533

Includes fix mentioned in #538

Test notebook located here: gs://gnomad-julia/notebooks/test_gnomad_ancestry_rf_classification.ipynb

and also attached:
test_gnomad_ancestry_rf_classification.ipynb.zip

@KoalaQin KoalaQin self-assigned this Jul 5, 2023
@KoalaQin KoalaQin self-requested a review July 5, 2023 17:47
convert_model_func: Optional[Callable[[Any], Any]] = None,
apply_model_func: Callable[
[pd.DataFrame, Any], Any
] = apply_sklearn_classification_model,
Copy link
Contributor

@KoalaQin KoalaQin Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm testing your notebook on converting v2, because I noticed you didn't test the converting on v2, but when I loaded the RF:

gnomad_v2_sklearn_rf = "gs://gcp-public-data--gnomad/release/2.1/pca/gnomad.r2.1.RF_fit.pkl"
with hl.hadoop_open(gnomad_v2_sklearn_rf, "rb") as f:
    v2_sklearn_fit = pickle.load(f)
It says: 
No module named 'sklearn.ensemble.forest'

Does it have anything to do with not import the module from outside on your line 214?

Copy link
Contributor Author

@jkgoodrich jkgoodrich Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's because the model is too old, so you need to use old versions of sklearn and other packages in order to loaded it. That is why we are updating to ONNX and why I didn't test the v2.1 RF model. Even though the v3.1 RF model loads, it still loads with

UserWarning: Trying to unpickle estimator RandomForestClassifier from version 0.23.2 when using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk.

Like mentioned in the users reported issue in #533

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so you converted them to ONNX version separately with older sklearn? Are you going to share the two *.onnx in the public bucket? The test in your notebook shows your functions are working to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a note about the #533 issue? So when someone tries to load the sklearn RF model, they are aware of this issue, they may use the *.onnx model directly if their python is more recent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so you converted them to ONNX version separately with older sklearn?

Yeah, I finally got the configurations needed for the v2.1 and v3.1 sklearn models to each load and converted them to ONNX models.

Are you going to share the two *.onnx in the public bucket?

That's the plan, but the first step was to make these functions and get them merged. Then I also have tickets to:

Add an example of gnomAD ancestry RF model use to gnomad_qc

Modify blog post on use of ancestry RF model to link to gnomad_qc example

The idea is that the ONNX models will replace the sklearn models, and the blog post will be updated with no code, but instead a link to an example in gnomad_qc so if we need to make changes to it, we can do that in gnomad_qc and not need to modify the blog post again.

Could you also add a note about the #533 issue? So when someone tries to load the sklearn RF model, they are aware of this issue, they may use the *.onnx model directly if their python is more recent.

I'm not sure what you mean? I can add this note to the gnomad_qc example (when I have made it), and mention it in the change to the blog post, but I don't think there is a good place for this note in the gnomad_methods code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You answered my last question in your plan. I understand gnomad_methods is more general.
I think it is good to go!

@KoalaQin KoalaQin self-requested a review July 10, 2023 16:40
@jkgoodrich jkgoodrich merged commit c8146ba into main Jul 10, 2023
@jkgoodrich jkgoodrich deleted the jg/modify_pop_assign_for_onnx branch July 10, 2023 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants