[SPARK-20047][ML] Constrained Logistic Regression #17715

yanboliang · 2017-04-21T09:06:24Z

What changes were proposed in this pull request?

MLlib LogisticRegression should support bound constrained optimization (only for L2 regularization). Users can add bound constraints to coefficients to make the solver produce solution in the specified range.

Under the hood, we call Breeze L-BFGS-B as the solver for bound constrained optimization. But in the current breeze implementation, there are some bugs in L-BFGS-B, and scalanlp/breeze#633 fixed them. We need to upgrade dependent breeze later, and currently we use the workaround L-BFGS-B in this PR temporary for reviewing.

How was this patch tested?

Unit tests.

SparkQA · 2017-04-21T10:06:25Z

Test build #76028 has finished for PR 17715 at commit 3cfd08c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang · 2017-04-21T17:17:26Z

mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala

When reg == 0, multinomial logistic regression has multiple solutions and we centralize the coefficients to get identical result for non-bound regression, but we didn't do this for bound constrained regression, since it may cross the bound if we centralize them. So here we check whether coefficients1 equals to coefficientsExpected + constant value for each column.

dbtsai · 2017-04-21T17:24:42Z

High level questions, what happen to LBFGSB if the initial condition is out of bound?

yanboliang · 2017-04-21T17:30:53Z

@dbtsai It hits this and throws exception.

dbtsai

Look in good shape to merge now. Only a few minor comments. Thanks.

dbtsai · 2017-04-24T20:01:56Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala

lowerBoundsOnCoefficients

dbtsai · 2017-04-24T20:04:40Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala

lowerBoundsOnIntercepts

dbtsai · 2017-04-24T20:25:48Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala

For override variable, do we need @Since tag?

dbtsai · 2017-04-24T20:30:51Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala

numCoeffsPlusIntercepts

dbtsai · 2017-04-24T20:38:42Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala

lowerBounds

dbtsai · 2017-04-24T22:02:11Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala

Mutating the states implicitly can be dangerous. Can we use apply, index, and update apis in matrix?

dbtsai · 2017-04-24T22:17:26Z

mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala

Can you check setboundsOnIntercepts and .setFitIntercept(false) should throw exception. Or one should override the other one.

Added corresponding test at test("logistic regression: illegal params"), I prefer to throw exception for this scenario.

dbtsai · 2017-04-24T22:24:48Z

mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala

Can we add a test having lower bounds for non-negativity?

dbtsai · 2017-04-24T22:31:51Z

mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala

Seems non of the coefficients hit the lower bound condition. Can you make the lower bounds condition as 2.0? Also, will be great to see the coefficients with / without bonds here together for virtual check.

Maybe we can have this one with upper bounds as well. The rest you can just keep them with only lowerBounds.

We can't hit the lower bound condition even I make the lower bounds as 2.0, or even 5.0 for this dataset. But I added a test have both lower and upper bounds ([1.0, 2.0]), the solution hit both bounds.

val coefficientsExpected3 = new DenseMatrix(3, 4, Array( 1.61967097, 1.16027835, 1.45131448, 1.97390431, 1.30529317, 2.0, 1.12985473, 1.26652854, 1.61647195, 1.0, 1.40642959, 1.72985589), isTransposed = true) val interceptsExpected3 = Vectors.dense(1.0, 2.0, 2.0)

dbtsai · 2017-04-24T22:36:20Z

mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala

coefficientsExpectedWithStd is easier to read :)

dbtsai · 2017-04-25T06:51:29Z

Many use-cases are setting the bounds as a constant instead of setting each dimensional individually. Maybe we can add the following APIs.

def setLowerBoundsOnIntercepts(value: Double)

def setUpperBoundsOnIntercepts(value: Double)

def setLowerBoundsOnCoefficients(value: Double)

def setUpperBoundsOnCoefficients(value: Double)

SparkQA · 2017-04-27T09:34:19Z

Test build #76221 has finished for PR 17715 at commit 43192a4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-27T10:00:37Z

Test build #76223 has finished for PR 17715 at commit 96fcec4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dbtsai · 2017-04-27T20:47:00Z

LGTM. Merged into master and branch-2.2

Thanks @yanboliang for delivering this big feature which is very useful for many practical use-cases in the industry.

Thanks @WeichenXu123 for fixing bugs in Breeze which we use as optimization building block at Spark ML.

## What changes were proposed in this pull request? MLlib ```LogisticRegression``` should support bound constrained optimization (only for L2 regularization). Users can add bound constraints to coefficients to make the solver produce solution in the specified range. Under the hood, we call Breeze [```L-BFGS-B```](https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/LBFGSB.scala) as the solver for bound constrained optimization. But in the current breeze implementation, there are some bugs in L-BFGS-B, and scalanlp/breeze#633 fixed them. We need to upgrade dependent breeze later, and currently we use the workaround L-BFGS-B in this PR temporary for reviewing. ## How was this patch tested? Unit tests. Author: Yanbo Liang <[email protected]> Closes #17715 from yanboliang/spark-20047. (cherry picked from commit 606432a) Signed-off-by: DB Tsai <[email protected]>

jkbradley

@yanboliang and @dbtsai Thanks for adding this! Definitely useful. I hope you don't mind I sent a few small follow-up comments.

Btw, we'll have to be careful about merging new APIs and major changes now that the RC process has begun.

jkbradley · 2017-04-28T22:26:09Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala

+   * The bound matrix must be compatible with the shape (1, number of features) for binomial
+   * regression, or (number of classes, number of features) for multinomial regression.
+   * Otherwise, it throws exception.
+   *


We should state the default value is none

Same for the other new bound Params

jkbradley · 2017-04-28T22:27:12Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala

+   * regression, or (number of classes, number of features) for multinomial regression.
+   * Otherwise, it throws exception.
+   *
+   * @group param


I'd recommend that bound-constrained optimization be put under expertParams. What do you think?

I agree to put them under expertParams. Thanks.

jkbradley · 2017-04-28T22:30:21Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala

+    }
+    if (!$(fitIntercept)) {
+      require(!isSet(lowerBoundsOnIntercepts) && !isSet(upperBoundsOnIntercepts),
+        "Pls don't set bounds on intercepts if fitting without intercept.")


"Pls don't" --> "Please do not"

jkbradley · 2017-04-28T22:45:26Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala

+      numCoefficientSets: Int,
+      numFeatures: Int): Unit = {
+    if (isSet(lowerBoundsOnCoefficients)) {
+      require($(lowerBoundsOnCoefficients).numRows == numCoefficientSets &&


These require() statements should have error messages so users know what went wrong.

jkbradley · 2017-04-28T22:47:23Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala

+    }
+    if (isSet(lowerBoundsOnCoefficients) && isSet(upperBoundsOnCoefficients)) {
+      require($(lowerBoundsOnCoefficients).toArray.zip($(upperBoundsOnCoefficients).toArray)
+        .forall(x => x._1 <= x._2), "LowerBoundsOnCoefficients should always " +


always => always be

jkbradley · 2017-04-28T22:47:33Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala

+    }
+    if (isSet(lowerBoundsOnIntercepts) && isSet(upperBoundsOnIntercepts)) {
+      require($(lowerBoundsOnIntercepts).toArray.zip($(upperBoundsOnIntercepts).toArray)
+        .forall(x => x._1 <= x._2), "LowerBoundsOnIntercepts should always " +


yanboliang · 2017-05-02T04:56:20Z

@jkbradley Thanks for your comments. I sent #17829 to address them, please feel free to review. Thanks.

## What changes were proposed in this pull request? Address some minor comments for #17715: * Put bound-constrained optimization params under expertParams. * Update some docs. ## How was this patch tested? Existing tests. Author: Yanbo Liang <[email protected]> Closes #17829 from yanboliang/spark-20047-followup. (cherry picked from commit c5dceb8) Signed-off-by: Yanbo Liang <[email protected]>

## What changes were proposed in this pull request? Address some minor comments for #17715: * Put bound-constrained optimization params under expertParams. * Update some docs. ## How was this patch tested? Existing tests. Author: Yanbo Liang <[email protected]> Closes #17829 from yanboliang/spark-20047-followup.

## What changes were proposed in this pull request? PR #17715 Added Constrained Logistic Regression for ML. We should add it to SparkR. ## How was this patch tested? Add new unit tests. Author: wangmiao1981 <[email protected]> Closes #18128 from wangmiao1981/test.

## What changes were proposed in this pull request? PR apache#17715 Added Constrained Logistic Regression for ML. We should add it to SparkR. ## How was this patch tested? Add new unit tests. Author: wangmiao1981 <[email protected]> Closes apache#18128 from wangmiao1981/test.

MLlib ```LogisticRegression``` should support bound constrained optimization (only for L2 regularization). Users can add bound constraints to coefficients to make the solver produce solution in the specified range. Under the hood, we call Breeze [```L-BFGS-B```](https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/LBFGSB.scala) as the solver for bound constrained optimization. But in the current breeze implementation, there are some bugs in L-BFGS-B, and scalanlp/breeze#633 fixed them. We need to upgrade dependent breeze later, and currently we use the workaround L-BFGS-B in this PR temporary for reviewing. Unit tests. Author: Yanbo Liang <[email protected]> Closes apache#17715 from yanboliang/spark-20047.

## What changes were proposed in this pull request? Address some minor comments for apache#17715: * Put bound-constrained optimization params under expertParams. * Update some docs. ## How was this patch tested? Existing tests. Author: Yanbo Liang <[email protected]> Closes apache#17829 from yanboliang/spark-20047-followup.

yanboliang changed the title ~~Spark 20047~~ [SPARK-20047][ML] Constrained Logistic Regression Apr 21, 2017

yanboliang commented Apr 21, 2017

View reviewed changes

dbtsai reviewed Apr 24, 2017

View reviewed changes

yanboliang added 12 commits April 26, 2017 11:32

Initial draft of bound constraint LoR.

aea265c

Add two test cases.

405dffb

update test cases.

37e0ada

update doc

2cab4e5

update LBFGSB

0e866e9

update test suites

92dfa15

add test case.

aa7242c

Remove workaround LBFGSB.

e3ea117

Rename variables.

1091fb1

Update test cases.

e708e0d

Update test cases.

4d51663

Add test for illegal params.

43192a4

yanboliang force-pushed the spark-20047 branch from 3cfd08c to 43192a4 Compare April 27, 2017 08:32

Reorg some code.

96fcec4

asfgit closed this in 606432a Apr 27, 2017

yanboliang deleted the spark-20047 branch April 28, 2017 02:42

jkbradley reviewed Apr 28, 2017

View reviewed changes

jkbradley mentioned this pull request May 1, 2017

[SPARK-20449][ML] Upgrade breeze version to 0.13.1 #17746

Closed

yanboliang mentioned this pull request May 2, 2017

[SPARK-20047][FOLLOWUP][ML] Constrained Logistic Regression follow up #17829

Closed

wangmiao1981 mentioned this pull request May 27, 2017

[SPARK-20906][SparkR]:Constrained Logistic Regression for SparkR #18128

Closed

[SPARK-20047][ML] Constrained Logistic Regression #17715

[SPARK-20047][ML] Constrained Logistic Regression #17715

Uh oh!

Conversation

yanboliang commented Apr 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Apr 21, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dbtsai commented Apr 21, 2017

Uh oh!

yanboliang commented Apr 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dbtsai left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yanboliang Apr 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dbtsai commented Apr 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Apr 27, 2017

Uh oh!

SparkQA commented Apr 27, 2017

Uh oh!

dbtsai commented Apr 27, 2017

Uh oh!

jkbradley left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yanboliang commented May 2, 2017

Uh oh!

Reviewers

yanboliang commented Apr 21, 2017 •

edited

Loading

yanboliang commented Apr 21, 2017 •

edited

Loading

yanboliang Apr 27, 2017 •

edited

Loading

dbtsai commented Apr 25, 2017 •

edited

Loading