-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-20047][ML] Constrained Logistic Regression #17715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #76028 has finished for PR 17715 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When reg == 0, multinomial logistic regression has multiple solutions and we centralize the coefficients to get identical result for non-bound regression, but we didn't do this for bound constrained regression, since it may cross the bound if we centralize them. So here we check whether coefficients1 equals to coefficientsExpected + constant value for each column.
|
High level questions, what happen to LBFGSB if the initial condition is out of bound? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look in good shape to merge now. Only a few minor comments. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lowerBoundsOnCoefficients
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lowerBoundsOnIntercepts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For override variable, do we need @Since tag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numCoeffsPlusIntercepts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lowerBounds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutating the states implicitly can be dangerous. Can we use apply, index, and update apis in matrix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check setboundsOnIntercepts and .setFitIntercept(false) should throw exception. Or one should override the other one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added corresponding test at test("logistic regression: illegal params"), I prefer to throw exception for this scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test having lower bounds for non-negativity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems non of the coefficients hit the lower bound condition. Can you make the lower bounds condition as 2.0? Also, will be great to see the coefficients with / without bonds here together for virtual check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can have this one with upper bounds as well. The rest you can just keep them with only lowerBounds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't hit the lower bound condition even I make the lower bounds as 2.0, or even 5.0 for this dataset. But I added a test have both lower and upper bounds ([1.0, 2.0]), the solution hit both bounds.
val coefficientsExpected3 = new DenseMatrix(3, 4, Array(
1.61967097, 1.16027835, 1.45131448, 1.97390431,
1.30529317, 2.0, 1.12985473, 1.26652854,
1.61647195, 1.0, 1.40642959, 1.72985589), isTransposed = true)
val interceptsExpected3 = Vectors.dense(1.0, 2.0, 2.0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coefficientsExpectedWithStd is easier to read :)
|
Many use-cases are setting the bounds as a constant instead of setting each dimensional individually. Maybe we can add the following APIs. def setLowerBoundsOnIntercepts(value: Double)
def setUpperBoundsOnIntercepts(value: Double)
def setLowerBoundsOnCoefficients(value: Double)
def setUpperBoundsOnCoefficients(value: Double) |
|
Test build #76221 has finished for PR 17715 at commit
|
|
Test build #76223 has finished for PR 17715 at commit
|
|
LGTM. Merged into master and branch-2.2 Thanks @yanboliang for delivering this big feature which is very useful for many practical use-cases in the industry. Thanks @WeichenXu123 for fixing bugs in Breeze which we use as optimization building block at Spark ML. |
## What changes were proposed in this pull request? MLlib ```LogisticRegression``` should support bound constrained optimization (only for L2 regularization). Users can add bound constraints to coefficients to make the solver produce solution in the specified range. Under the hood, we call Breeze [```L-BFGS-B```](https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/LBFGSB.scala) as the solver for bound constrained optimization. But in the current breeze implementation, there are some bugs in L-BFGS-B, and scalanlp/breeze#633 fixed them. We need to upgrade dependent breeze later, and currently we use the workaround L-BFGS-B in this PR temporary for reviewing. ## How was this patch tested? Unit tests. Author: Yanbo Liang <[email protected]> Closes #17715 from yanboliang/spark-20047. (cherry picked from commit 606432a) Signed-off-by: DB Tsai <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yanboliang and @dbtsai Thanks for adding this! Definitely useful. I hope you don't mind I sent a few small follow-up comments.
Btw, we'll have to be careful about merging new APIs and major changes now that the RC process has begun.
| * The bound matrix must be compatible with the shape (1, number of features) for binomial | ||
| * regression, or (number of classes, number of features) for multinomial regression. | ||
| * Otherwise, it throws exception. | ||
| * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should state the default value is none
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for the other new bound Params
| * regression, or (number of classes, number of features) for multinomial regression. | ||
| * Otherwise, it throws exception. | ||
| * | ||
| * @group param |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend that bound-constrained optimization be put under expertParams. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree to put them under expertParams. Thanks.
| } | ||
| if (!$(fitIntercept)) { | ||
| require(!isSet(lowerBoundsOnIntercepts) && !isSet(upperBoundsOnIntercepts), | ||
| "Pls don't set bounds on intercepts if fitting without intercept.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Pls don't" --> "Please do not"
| numCoefficientSets: Int, | ||
| numFeatures: Int): Unit = { | ||
| if (isSet(lowerBoundsOnCoefficients)) { | ||
| require($(lowerBoundsOnCoefficients).numRows == numCoefficientSets && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These require() statements should have error messages so users know what went wrong.
| } | ||
| if (isSet(lowerBoundsOnCoefficients) && isSet(upperBoundsOnCoefficients)) { | ||
| require($(lowerBoundsOnCoefficients).toArray.zip($(upperBoundsOnCoefficients).toArray) | ||
| .forall(x => x._1 <= x._2), "LowerBoundsOnCoefficients should always " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always => always be
| } | ||
| if (isSet(lowerBoundsOnIntercepts) && isSet(upperBoundsOnIntercepts)) { | ||
| require($(lowerBoundsOnIntercepts).toArray.zip($(upperBoundsOnIntercepts).toArray) | ||
| .forall(x => x._1 <= x._2), "LowerBoundsOnIntercepts should always " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
@jkbradley Thanks for your comments. I sent #17829 to address them, please feel free to review. Thanks. |
## What changes were proposed in this pull request? Address some minor comments for #17715: * Put bound-constrained optimization params under expertParams. * Update some docs. ## How was this patch tested? Existing tests. Author: Yanbo Liang <[email protected]> Closes #17829 from yanboliang/spark-20047-followup. (cherry picked from commit c5dceb8) Signed-off-by: Yanbo Liang <[email protected]>
## What changes were proposed in this pull request? Address some minor comments for #17715: * Put bound-constrained optimization params under expertParams. * Update some docs. ## How was this patch tested? Existing tests. Author: Yanbo Liang <[email protected]> Closes #17829 from yanboliang/spark-20047-followup.
## What changes were proposed in this pull request? PR #17715 Added Constrained Logistic Regression for ML. We should add it to SparkR. ## How was this patch tested? Add new unit tests. Author: wangmiao1981 <[email protected]> Closes #18128 from wangmiao1981/test.
## What changes were proposed in this pull request? PR apache#17715 Added Constrained Logistic Regression for ML. We should add it to SparkR. ## How was this patch tested? Add new unit tests. Author: wangmiao1981 <[email protected]> Closes apache#18128 from wangmiao1981/test.
MLlib ```LogisticRegression``` should support bound constrained optimization (only for L2 regularization). Users can add bound constraints to coefficients to make the solver produce solution in the specified range. Under the hood, we call Breeze [```L-BFGS-B```](https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/LBFGSB.scala) as the solver for bound constrained optimization. But in the current breeze implementation, there are some bugs in L-BFGS-B, and scalanlp/breeze#633 fixed them. We need to upgrade dependent breeze later, and currently we use the workaround L-BFGS-B in this PR temporary for reviewing. Unit tests. Author: Yanbo Liang <[email protected]> Closes apache#17715 from yanboliang/spark-20047.
## What changes were proposed in this pull request? Address some minor comments for apache#17715: * Put bound-constrained optimization params under expertParams. * Update some docs. ## How was this patch tested? Existing tests. Author: Yanbo Liang <[email protected]> Closes apache#17829 from yanboliang/spark-20047-followup.
What changes were proposed in this pull request?
MLlib
LogisticRegressionshould support bound constrained optimization (only for L2 regularization). Users can add bound constraints to coefficients to make the solver produce solution in the specified range.Under the hood, we call Breeze
L-BFGS-Bas the solver for bound constrained optimization. But in the current breeze implementation, there are some bugs in L-BFGS-B, and scalanlp/breeze#633 fixed them. We need to upgrade dependent breeze later, and currently we use the workaround L-BFGS-B in this PR temporary for reviewing.How was this patch tested?
Unit tests.