Skip to content

Conversation

@yanboliang
Copy link
Contributor

@yanboliang yanboliang commented Apr 21, 2017

What changes were proposed in this pull request?

MLlib LogisticRegression should support bound constrained optimization (only for L2 regularization). Users can add bound constraints to coefficients to make the solver produce solution in the specified range.

Under the hood, we call Breeze L-BFGS-B as the solver for bound constrained optimization. But in the current breeze implementation, there are some bugs in L-BFGS-B, and scalanlp/breeze#633 fixed them. We need to upgrade dependent breeze later, and currently we use the workaround L-BFGS-B in this PR temporary for reviewing.

How was this patch tested?

Unit tests.

@yanboliang yanboliang changed the title Spark 20047 [SPARK-20047][ML] Constrained Logistic Regression Apr 21, 2017
@SparkQA
Copy link

SparkQA commented Apr 21, 2017

Test build #76028 has finished for PR 17715 at commit 3cfd08c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reg == 0, multinomial logistic regression has multiple solutions and we centralize the coefficients to get identical result for non-bound regression, but we didn't do this for bound constrained regression, since it may cross the bound if we centralize them. So here we check whether coefficients1 equals to coefficientsExpected + constant value for each column.

@dbtsai
Copy link
Member

dbtsai commented Apr 21, 2017

High level questions, what happen to LBFGSB if the initial condition is out of bound?

@yanboliang
Copy link
Contributor Author

yanboliang commented Apr 21, 2017

@dbtsai It hits this and throws exception.

Copy link
Member

@dbtsai dbtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look in good shape to merge now. Only a few minor comments. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lowerBoundsOnCoefficients

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lowerBoundsOnIntercepts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For override variable, do we need @Since tag?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numCoeffsPlusIntercepts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lowerBounds

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutating the states implicitly can be dangerous. Can we use apply, index, and update apis in matrix?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check setboundsOnIntercepts and .setFitIntercept(false) should throw exception. Or one should override the other one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added corresponding test at test("logistic regression: illegal params"), I prefer to throw exception for this scenario.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test having lower bounds for non-negativity?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems non of the coefficients hit the lower bound condition. Can you make the lower bounds condition as 2.0? Also, will be great to see the coefficients with / without bonds here together for virtual check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can have this one with upper bounds as well. The rest you can just keep them with only lowerBounds.

Copy link
Contributor Author

@yanboliang yanboliang Apr 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't hit the lower bound condition even I make the lower bounds as 2.0, or even 5.0 for this dataset. But I added a test have both lower and upper bounds ([1.0, 2.0]), the solution hit both bounds.

val coefficientsExpected3 = new DenseMatrix(3, 4, Array(
    1.61967097, 1.16027835, 1.45131448, 1.97390431,
    1.30529317, 2.0, 1.12985473, 1.26652854,
    1.61647195, 1.0, 1.40642959, 1.72985589), isTransposed = true)
val interceptsExpected3 = Vectors.dense(1.0, 2.0, 2.0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coefficientsExpectedWithStd is easier to read :)

@dbtsai
Copy link
Member

dbtsai commented Apr 25, 2017

Many use-cases are setting the bounds as a constant instead of setting each dimensional individually. Maybe we can add the following APIs.

def setLowerBoundsOnIntercepts(value: Double)

def setUpperBoundsOnIntercepts(value: Double)

def setLowerBoundsOnCoefficients(value: Double)

def setUpperBoundsOnCoefficients(value: Double)

@SparkQA
Copy link

SparkQA commented Apr 27, 2017

Test build #76221 has finished for PR 17715 at commit 43192a4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 27, 2017

Test build #76223 has finished for PR 17715 at commit 96fcec4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dbtsai
Copy link
Member

dbtsai commented Apr 27, 2017

LGTM. Merged into master and branch-2.2

Thanks @yanboliang for delivering this big feature which is very useful for many practical use-cases in the industry.

Thanks @WeichenXu123 for fixing bugs in Breeze which we use as optimization building block at Spark ML.

asfgit pushed a commit that referenced this pull request Apr 27, 2017
## What changes were proposed in this pull request?
MLlib ```LogisticRegression``` should support bound constrained optimization (only for L2 regularization). Users can add bound constraints to coefficients to make the solver produce solution in the specified range.

Under the hood, we call Breeze [```L-BFGS-B```](https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/LBFGSB.scala) as the solver for bound constrained optimization. But in the current breeze implementation, there are some bugs in L-BFGS-B, and scalanlp/breeze#633 fixed them. We need to upgrade dependent breeze later, and currently we use the workaround L-BFGS-B in this PR temporary for reviewing.

## How was this patch tested?
Unit tests.

Author: Yanbo Liang <[email protected]>

Closes #17715 from yanboliang/spark-20047.

(cherry picked from commit 606432a)
Signed-off-by: DB Tsai <[email protected]>
@asfgit asfgit closed this in 606432a Apr 27, 2017
@yanboliang yanboliang deleted the spark-20047 branch April 28, 2017 02:42
Copy link
Member

@jkbradley jkbradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yanboliang and @dbtsai Thanks for adding this! Definitely useful. I hope you don't mind I sent a few small follow-up comments.

Btw, we'll have to be careful about merging new APIs and major changes now that the RC process has begun.

* The bound matrix must be compatible with the shape (1, number of features) for binomial
* regression, or (number of classes, number of features) for multinomial regression.
* Otherwise, it throws exception.
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should state the default value is none

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the other new bound Params

* regression, or (number of classes, number of features) for multinomial regression.
* Otherwise, it throws exception.
*
* @group param
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend that bound-constrained optimization be put under expertParams. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to put them under expertParams. Thanks.

}
if (!$(fitIntercept)) {
require(!isSet(lowerBoundsOnIntercepts) && !isSet(upperBoundsOnIntercepts),
"Pls don't set bounds on intercepts if fitting without intercept.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Pls don't" --> "Please do not"

numCoefficientSets: Int,
numFeatures: Int): Unit = {
if (isSet(lowerBoundsOnCoefficients)) {
require($(lowerBoundsOnCoefficients).numRows == numCoefficientSets &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These require() statements should have error messages so users know what went wrong.

}
if (isSet(lowerBoundsOnCoefficients) && isSet(upperBoundsOnCoefficients)) {
require($(lowerBoundsOnCoefficients).toArray.zip($(upperBoundsOnCoefficients).toArray)
.forall(x => x._1 <= x._2), "LowerBoundsOnCoefficients should always " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always => always be

}
if (isSet(lowerBoundsOnIntercepts) && isSet(upperBoundsOnIntercepts)) {
require($(lowerBoundsOnIntercepts).toArray.zip($(upperBoundsOnIntercepts).toArray)
.forall(x => x._1 <= x._2), "LowerBoundsOnIntercepts should always " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@yanboliang
Copy link
Contributor Author

@jkbradley Thanks for your comments. I sent #17829 to address them, please feel free to review. Thanks.

asfgit pushed a commit that referenced this pull request May 4, 2017
## What changes were proposed in this pull request?
Address some minor comments for #17715:
* Put bound-constrained optimization params under expertParams.
* Update some docs.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <[email protected]>

Closes #17829 from yanboliang/spark-20047-followup.

(cherry picked from commit c5dceb8)
Signed-off-by: Yanbo Liang <[email protected]>
asfgit pushed a commit that referenced this pull request May 4, 2017
## What changes were proposed in this pull request?
Address some minor comments for #17715:
* Put bound-constrained optimization params under expertParams.
* Update some docs.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <[email protected]>

Closes #17829 from yanboliang/spark-20047-followup.
asfgit pushed a commit that referenced this pull request Jun 22, 2017
## What changes were proposed in this pull request?

PR #17715 Added Constrained Logistic Regression for ML. We should add it to SparkR.

## How was this patch tested?

Add new unit tests.

Author: wangmiao1981 <[email protected]>

Closes #18128 from wangmiao1981/test.
robert3005 pushed a commit to palantir/spark that referenced this pull request Jun 29, 2017
## What changes were proposed in this pull request?

PR apache#17715 Added Constrained Logistic Regression for ML. We should add it to SparkR.

## How was this patch tested?

Add new unit tests.

Author: wangmiao1981 <[email protected]>

Closes apache#18128 from wangmiao1981/test.
jzhuge pushed a commit to jzhuge/spark that referenced this pull request Aug 20, 2018
MLlib ```LogisticRegression``` should support bound constrained optimization (only for L2 regularization). Users can add bound constraints to coefficients to make the solver produce solution in the specified range.

Under the hood, we call Breeze [```L-BFGS-B```](https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/LBFGSB.scala) as the solver for bound constrained optimization. But in the current breeze implementation, there are some bugs in L-BFGS-B, and scalanlp/breeze#633 fixed them. We need to upgrade dependent breeze later, and currently we use the workaround L-BFGS-B in this PR temporary for reviewing.

Unit tests.

Author: Yanbo Liang <[email protected]>

Closes apache#17715 from yanboliang/spark-20047.
jzhuge pushed a commit to jzhuge/spark that referenced this pull request Aug 20, 2018
## What changes were proposed in this pull request?
Address some minor comments for apache#17715:
* Put bound-constrained optimization params under expertParams.
* Update some docs.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <[email protected]>

Closes apache#17829 from yanboliang/spark-20047-followup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants