-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-20906][SparkR]:Constrained Logistic Regression for SparkR #18128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #77455 has finished for PR 18128 at commit
|
|
Test build #77456 has finished for PR 18128 at commit
|
|
Test build #77457 has finished for PR 18128 at commit
|
|
Test build #77462 has finished for PR 18128 at commit
|
R/pkg/R/mllib_classification.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add some check for upperBoundsOnCoefficients or lowerBoundsOnCoefficients
for example, if upperBoundsOnCoefficients is a vector instead of a matrix, ncol(as.array(upperBoundsOnCoefficients)) will be NA
|
Test build #77495 has finished for PR 18128 at commit
|
|
@yanboliang Can you take a look? Thanks! |
|
Test build #77607 has finished for PR 18128 at commit
|
|
@wangmiao1981 Will review soon. Thanks. |
R/pkg/R/mllib_classification.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we can merge the four parameters into two: nrow and ncol, since the matrix format of lowerBoundsOnCoefficients and upperBoundsOnCoefficients should be consistent. If the input doesn't confirm with this rule, we need to throw exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Based on my understanding, lowerBoundsOnCoefficients and upperBoundsOnCoefficients are not required to set at the same time. They can be set at the same time.
For the first case, we can't enforce the dimension of the two matrices because one could be NULL.
For the second case, we can check it.
So, we can't enforce the rule in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think I can do the check because I have a NULL check before enforcing the rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above, we can merge the four parameters into two: numRowsOfBoundsOnCoefficients and numColsOfBoundsOnCoefficients, please follow Scala naming convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add test for multinomial logistic regression? I think only test one side bound is enough.
|
Test build #77691 has finished for PR 18128 at commit
|
|
Test build #77695 has finished for PR 18128 at commit
|
R/pkg/R/mllib_classification.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paste would insert space - use paste0 instead or just remove the space in the string to let it to add it
R/pkg/R/mllib_classification.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no need to as.integer(row) and as.integer(col) since they are set internally and not a parameter
R/pkg/R/mllib_classification.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given how this is used later in scala code, should there be a check that nrow(upper) == nrow(lower) and ditto for ncol(upper) == ncol(lower)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the case where we only set the upperbound. We can set both or either one of them.
For the case that both are set. We enforce upperbound and lowerbound are the same dimension, as checked above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok thanks, L290-291
|
Test build #77699 has finished for PR 18128 at commit
|
|
Jenkins retest this please |
|
Test build #77703 has finished for PR 18128 at commit
|
|
Local test passed. Let me check it tonight. |
|
@felixcheung if I remove |
|
Test build #77706 has finished for PR 18128 at commit
|
|
ok, thanks, i guess it could be set as |
|
Test build #77909 has finished for PR 18128 at commit
|
|
ping @yanboliang |
|
@wangmiao1981 I'm at DataWorks Summit today and tomorrow, will take a look by this Friday. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment, LGTM otherwise
R/pkg/R/mllib_classification.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size must be equal with 1 - should this be size must be equal to 1?
R/pkg/R/mllib_classification.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto here size must be equal with 1
R/pkg/R/mllib_classification.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while logically and semantically correct here, might prefer && and || instead of & and | here for programmability/correctness in case something becomes length > 1 here later
|
Test build #78172 has finished for PR 18128 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM again.
@yanboliang do you want to take a last look before merging this?
|
merged to master. |
## What changes were proposed in this pull request? PR apache#17715 Added Constrained Logistic Regression for ML. We should add it to SparkR. ## How was this patch tested? Add new unit tests. Author: wangmiao1981 <[email protected]> Closes apache#18128 from wangmiao1981/test.
What changes were proposed in this pull request?
PR #17715 Added Constrained Logistic Regression for ML. We should add it to SparkR.
How was this patch tested?
Add new unit tests.