-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-19985][ML] Fixed copy method for some ML Models #17326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-19985][ML] Fixed copy method for some ML Models #17326
Conversation
|
Test build #74695 has finished for PR 17326 at commit
|
|
ping @jkbradley @MLnick |
|
ping @holdenk, I could do a backport too for 2.1.1 if needed |
| .setMaxIter(100) | ||
| .setSolver("l-bfgs") | ||
| val model = trainer.fit(dataset) | ||
| MLTestingUtils.checkCopy(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just something for consideration. Not sure if this is the best way to add unit test for this. I don't think we need to add check only for the classes included in this PR. And even if we do, maybe it's possible to do it in a more uniform way, like in testEstimatorAndModelReadWrite or add a pipelineCompatibilityTest as necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hhbyyh , I agree that this is maybe not the best general way to test it, but this is how it is done in every other Model test. Maybe this is ok for now and we could follow up with a discussion on the best way to check "basic" ML functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do it this way for now - would be good to log a JIRA to investigate a better and more generic way of testing so that it doesn't have to be done manually for each newly added Estimator/Model (or when a Model changes such that defaultCopy is no longer sufficient, etc)
| @Since("1.5.0") | ||
| override def copy(extra: ParamMap): MultilayerPerceptronClassificationModel = { | ||
| copyValues(new MultilayerPerceptronClassificationModel(uid, layers, weights), extra) | ||
| val copied = new MultilayerPerceptronClassificationModel(uid, layers, weights) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem that we have a "standard" way of doing this, i.e. some places do
val copied = new X().setParent(parent)and some places do
val copied = ...
copyValues(...).setParent(parent)But at least for this PR let's be consistent, so maybe change this one to be the first version like you have in the rest of the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, definitely. I noticed they were inconsistent too, so tried to change these to be like most of the others, but I must have forgot to fix this one.
|
Thanks for the review @MLnick! |
|
Test build #75011 has finished for PR 17326 at commit
|
|
Jenkins retest this please |
|
LGTM |
|
Test build #75433 has finished for PR 17326 at commit
|
|
Merged to master. Thanks! |
What changes were proposed in this pull request?
Some ML Models were using
defaultCopywhich expects a default constructor, and others were not setting the parent estimator. This change fixes these by creating a new instance of the model and explicitly setting values and parent.How was this patch tested?
Added
MLTestingUtils.checkCopyto the offending models to tests to verify the copy is made and parent is set.