-
Notifications
You must be signed in to change notification settings - Fork 31
Add parameters apply_model_func
and convert_model_func
to assign_population_pcs
so it has the ability to work with other models types
#558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
convert_model_func: Optional[Callable[[Any], Any]] = None, | ||
apply_model_func: Callable[ | ||
[pd.DataFrame, Any], Any | ||
] = apply_sklearn_classification_model, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm testing your notebook on converting v2, because I noticed you didn't test the converting on v2, but when I loaded the RF:
gnomad_v2_sklearn_rf = "gs://gcp-public-data--gnomad/release/2.1/pca/gnomad.r2.1.RF_fit.pkl"
with hl.hadoop_open(gnomad_v2_sklearn_rf, "rb") as f:
v2_sklearn_fit = pickle.load(f)
It says:
No module named 'sklearn.ensemble.forest'
Does it have anything to do with not import the module from outside on your line 214?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's because the model is too old, so you need to use old versions of sklearn and other packages in order to loaded it. That is why we are updating to ONNX and why I didn't test the v2.1 RF model. Even though the v3.1 RF model loads, it still loads with
UserWarning: Trying to unpickle estimator RandomForestClassifier from version 0.23.2 when using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk.
Like mentioned in the users reported issue in #533
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so you converted them to ONNX version separately with older sklearn? Are you going to share the two *.onnx in the public bucket? The test in your notebook shows your functions are working to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add a note about the #533 issue? So when someone tries to load the sklearn RF model, they are aware of this issue, they may use the *.onnx model directly if their python is more recent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so you converted them to ONNX version separately with older sklearn?
Yeah, I finally got the configurations needed for the v2.1 and v3.1 sklearn models to each load and converted them to ONNX models.
Are you going to share the two *.onnx in the public bucket?
That's the plan, but the first step was to make these functions and get them merged. Then I also have tickets to:
Add an example of gnomAD ancestry RF model use to gnomad_qc
Modify blog post on use of ancestry RF model to link to gnomad_qc
example
The idea is that the ONNX models will replace the sklearn models, and the blog post will be updated with no code, but instead a link to an example in gnomad_qc
so if we need to make changes to it, we can do that in gnomad_qc
and not need to modify the blog post again.
Could you also add a note about the #533 issue? So when someone tries to load the sklearn RF model, they are aware of this issue, they may use the *.onnx model directly if their python is more recent.
I'm not sure what you mean? I can add this note to the gnomad_qc
example (when I have made it), and mention it in the change to the blog post, but I don't think there is a good place for this note in the gnomad_methods
code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You answered my last question in your plan. I understand gnomad_methods is more general.
I think it is good to go!
Adds functions
apply_sklearn_classification_model
,apply_onnx_classification_model
andconvert_sklearn_rf_to_onnx
to support the use of ONNX models as suggested by a user in #533Includes fix mentioned in #538
Test notebook located here: gs://gnomad-julia/notebooks/test_gnomad_ancestry_rf_classification.ipynb
and also attached:
test_gnomad_ancestry_rf_classification.ipynb.zip