Skip to content

Conversation

@jkbradley
Copy link
Member

Main change: Added isValid field to Param. Modified all usages to use isValid when relevant. Added helper methods in ParamValidate.

Also overrode Params.validate() in:

  • CrossValidator + model
  • Pipeline + model

I made a few updates for the elastic net patch:

  • I changed "tol" to "convergenceTol"
  • I added some documentation

This PR is Scala + Java only. Python will be in a follow-up PR.

CC: @mengxr

@SparkQA
Copy link

SparkQA commented Apr 28, 2015

Test build #31126 has finished for PR 5740 at commit f02c3c3.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Param[T] (val parent: Params, val name: String, val doc: String, val isValid: T => Boolean)
    • class DoubleParam(parent: Params, name: String, doc: String, isValid: Double => Boolean)
    • class IntParam(parent: Params, name: String, doc: String, isValid: Int => Boolean)
    • class FloatParam(parent: Params, name: String, doc: String, isValid: Float => Boolean)
    • class LongParam(parent: Params, name: String, doc: String, isValid: Long => Boolean)
    • class BooleanParam(parent: Params, name: String, doc: String) // No need for isValid
    • case class ParamPair[T](param: Param[T], value: T)
  • This patch does not change any dependencies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add doc here

@SparkQA
Copy link

SparkQA commented Apr 28, 2015

Test build #31148 has finished for PR 5740 at commit 9cf4dc5.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • class Param[T] (val parent: Params, val name: String, val doc: String, val isValid: T => Boolean)
    • class DoubleParam(parent: Params, name: String, doc: String, isValid: Double => Boolean)
    • class IntParam(parent: Params, name: String, doc: String, isValid: Int => Boolean)
    • class FloatParam(parent: Params, name: String, doc: String, isValid: Float => Boolean)
    • class LongParam(parent: Params, name: String, doc: String, isValid: Long => Boolean)
    • class BooleanParam(parent: Params, name: String, doc: String) // No need for isValid
    • case class ParamPair[T](param: Param[T], value: T)
  • This patch removes the following dependencies:
    • RoaringBitmap-0.4.5.jar
    • activation-1.1.jar
    • akka-actor_2.10-2.3.4-spark.jar
    • akka-remote_2.10-2.3.4-spark.jar
    • akka-slf4j_2.10-2.3.4-spark.jar
    • aopalliance-1.0.jar
    • arpack_combined_all-0.1.jar
    • avro-1.7.7.jar
    • breeze-macros_2.10-0.11.2.jar
    • breeze_2.10-0.11.2.jar
    • chill-java-0.5.0.jar
    • chill_2.10-0.5.0.jar
    • commons-beanutils-1.7.0.jar
    • commons-beanutils-core-1.8.0.jar
    • commons-cli-1.2.jar
    • commons-codec-1.10.jar
    • commons-collections-3.2.1.jar
    • commons-compress-1.4.1.jar
    • commons-configuration-1.6.jar
    • commons-digester-1.8.jar
    • commons-httpclient-3.1.jar
    • commons-io-2.1.jar
    • commons-lang-2.5.jar
    • commons-lang3-3.3.2.jar
    • commons-math-2.1.jar
    • commons-math3-3.4.1.jar
    • commons-net-2.2.jar
    • compress-lzf-1.0.0.jar
    • config-1.2.1.jar
    • core-1.1.2.jar
    • curator-client-2.4.0.jar
    • curator-framework-2.4.0.jar
    • curator-recipes-2.4.0.jar
    • gmbal-api-only-3.0.0-b023.jar
    • grizzly-framework-2.1.2.jar
    • grizzly-http-2.1.2.jar
    • grizzly-http-server-2.1.2.jar
    • grizzly-http-servlet-2.1.2.jar
    • grizzly-rcm-2.1.2.jar
    • groovy-all-2.3.7.jar
    • guava-14.0.1.jar
    • guice-3.0.jar
    • hadoop-annotations-2.2.0.jar
    • hadoop-auth-2.2.0.jar
    • hadoop-client-2.2.0.jar
    • hadoop-common-2.2.0.jar
    • hadoop-hdfs-2.2.0.jar
    • hadoop-mapreduce-client-app-2.2.0.jar
    • hadoop-mapreduce-client-common-2.2.0.jar
    • hadoop-mapreduce-client-core-2.2.0.jar
    • hadoop-mapreduce-client-jobclient-2.2.0.jar
    • hadoop-mapreduce-client-shuffle-2.2.0.jar
    • hadoop-yarn-api-2.2.0.jar
    • hadoop-yarn-client-2.2.0.jar
    • hadoop-yarn-common-2.2.0.jar
    • hadoop-yarn-server-common-2.2.0.jar
    • ivy-2.4.0.jar
    • jackson-annotations-2.4.0.jar
    • jackson-core-2.4.4.jar
    • jackson-core-asl-1.8.8.jar
    • jackson-databind-2.4.4.jar
    • jackson-jaxrs-1.8.8.jar
    • jackson-mapper-asl-1.8.8.jar
    • jackson-module-scala_2.10-2.4.4.jar
    • jackson-xc-1.8.8.jar
    • jansi-1.4.jar
    • javax.inject-1.jar
    • javax.servlet-3.0.0.v201112011016.jar
    • javax.servlet-3.1.jar
    • javax.servlet-api-3.0.1.jar
    • jaxb-api-2.2.2.jar
    • jaxb-impl-2.2.3-1.jar
    • jcl-over-slf4j-1.7.10.jar
    • jersey-client-1.9.jar
    • jersey-core-1.9.jar
    • jersey-grizzly2-1.9.jar
    • jersey-guice-1.9.jar
    • jersey-json-1.9.jar
    • jersey-server-1.9.jar
    • jersey-test-framework-core-1.9.jar
    • jersey-test-framework-grizzly2-1.9.jar
    • jets3t-0.7.1.jar
    • jettison-1.1.jar
    • jetty-util-6.1.26.jar
    • jline-0.9.94.jar
    • jline-2.10.4.jar
    • jodd-core-3.6.3.jar
    • json4s-ast_2.10-3.2.10.jar
    • json4s-core_2.10-3.2.10.jar
    • json4s-jackson_2.10-3.2.10.jar
    • jsr305-1.3.9.jar
    • jtransforms-2.4.0.jar
    • jul-to-slf4j-1.7.10.jar
    • kryo-2.21.jar
    • log4j-1.2.17.jar
    • lz4-1.2.0.jar
    • management-api-3.0.0-b012.jar
    • mesos-0.21.0-shaded-protobuf.jar
    • metrics-core-3.1.0.jar
    • metrics-graphite-3.1.0.jar
    • metrics-json-3.1.0.jar
    • metrics-jvm-3.1.0.jar
    • minlog-1.2.jar
    • netty-3.8.0.Final.jar
    • netty-all-4.0.23.Final.jar
    • objenesis-1.2.jar
    • opencsv-2.3.jar
    • oro-2.0.8.jar
    • paranamer-2.6.jar
    • parquet-column-1.6.0rc3.jar
    • parquet-common-1.6.0rc3.jar
    • parquet-encoding-1.6.0rc3.jar
    • parquet-format-2.2.0-rc1.jar
    • parquet-generator-1.6.0rc3.jar
    • parquet-hadoop-1.6.0rc3.jar
    • parquet-jackson-1.6.0rc3.jar
    • protobuf-java-2.4.1.jar
    • protobuf-java-2.5.0-spark.jar
    • py4j-0.8.2.1.jar
    • pyrolite-2.0.1.jar
    • quasiquotes_2.10-2.0.1.jar
    • reflectasm-1.07-shaded.jar
    • scala-compiler-2.10.4.jar
    • scala-library-2.10.4.jar
    • scala-reflect-2.10.4.jar
    • scalap-2.10.4.jar
    • scalatest_2.10-2.2.1.jar
    • slf4j-api-1.7.10.jar
    • slf4j-log4j12-1.7.10.jar
    • snappy-java-1.1.1.7.jar
    • spark-bagel_2.10-1.4.0-SNAPSHOT.jar
    • spark-catalyst_2.10-1.4.0-SNAPSHOT.jar
    • spark-core_2.10-1.4.0-SNAPSHOT.jar
    • spark-graphx_2.10-1.4.0-SNAPSHOT.jar
    • spark-launcher_2.10-1.4.0-SNAPSHOT.jar
    • spark-mllib_2.10-1.4.0-SNAPSHOT.jar
    • spark-network-common_2.10-1.4.0-SNAPSHOT.jar
    • spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar
    • spark-repl_2.10-1.4.0-SNAPSHOT.jar
    • spark-sql_2.10-1.4.0-SNAPSHOT.jar
    • spark-streaming_2.10-1.4.0-SNAPSHOT.jar
    • spire-macros_2.10-0.7.4.jar
    • spire_2.10-0.7.4.jar
    • stax-api-1.0.1.jar
    • stream-2.7.0.jar
    • tachyon-0.6.4.jar
    • tachyon-client-0.6.4.jar
    • uncommons-maths-1.2.2a.jar
    • unused-1.0.0.jar
    • xmlenc-0.52.jar
    • xz-1.0.jar
    • zookeeper-3.4.5.jar

@jkbradley
Copy link
Member Author

Btw, I made a few updates for the elastic net patch:

  • I changed "tol" to "convergenceTol"
  • I added some documentation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mengxr Please confirm this isValid is correct

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@jkbradley
Copy link
Member Author

@mengxr Hopefully that fixed everything. I'm sending separate PRs for those other 2 small, unrelated changes

@SparkQA
Copy link

SparkQA commented Apr 28, 2015

Test build #31178 has finished for PR 5740 at commit e54d79e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Param[T] (val parent: Params, val name: String, val doc: String, val isValid: T => Boolean)
    • class DoubleParam(parent: Params, name: String, doc: String, isValid: Double => Boolean)
    • class IntParam(parent: Params, name: String, doc: String, isValid: Int => Boolean)
    • class FloatParam(parent: Params, name: String, doc: String, isValid: Float => Boolean)
    • class LongParam(parent: Params, name: String, doc: String, isValid: Long => Boolean)
    • class BooleanParam(parent: Params, name: String, doc: String) // No need for isValid
    • case class ParamPair[T](param: Param[T], value: T)
    • trait LDAOptimizer
    • class EMLDAOptimizer extends LDAOptimizer
    • class OffsetRange(object):
    • class TopicAndPartition(object):
    • class Broker(object):
    • trait ExpectsInputTypes
    • abstract class BinaryMathExpression(f: (Double, Double) => Double, name: String)
    • case class Pow(left: Expression, right: Expression) extends BinaryMathExpression(math.pow, "POWER")
    • case class Hypot(
    • case class Atan2(
    • abstract class MathematicalExpression(name: String)
    • abstract class MathematicalExpressionForDouble(f: Double => Double, name: String)
    • abstract class MathematicalExpressionForInt(f: Int => Int, name: String)
    • abstract class MathematicalExpressionForFloat(f: Float => Float, name: String)
    • abstract class MathematicalExpressionForLong(f: Long => Long, name: String)
    • case class Sin(child: Expression) extends MathematicalExpressionForDouble(math.sin, "SIN")
    • case class Asin(child: Expression) extends MathematicalExpressionForDouble(math.asin, "ASIN")
    • case class Sinh(child: Expression) extends MathematicalExpressionForDouble(math.sinh, "SINH")
    • case class Cos(child: Expression) extends MathematicalExpressionForDouble(math.cos, "COS")
    • case class Acos(child: Expression) extends MathematicalExpressionForDouble(math.acos, "ACOS")
    • case class Cosh(child: Expression) extends MathematicalExpressionForDouble(math.cosh, "COSH")
    • case class Tan(child: Expression) extends MathematicalExpressionForDouble(math.tan, "TAN")
    • case class Atan(child: Expression) extends MathematicalExpressionForDouble(math.atan, "ATAN")
    • case class Tanh(child: Expression) extends MathematicalExpressionForDouble(math.tanh, "TANH")
    • case class Ceil(child: Expression) extends MathematicalExpressionForDouble(math.ceil, "CEIL")
    • case class Floor(child: Expression) extends MathematicalExpressionForDouble(math.floor, "FLOOR")
    • case class Rint(child: Expression) extends MathematicalExpressionForDouble(math.rint, "ROUND")
    • case class Cbrt(child: Expression) extends MathematicalExpressionForDouble(math.cbrt, "CBRT")
    • case class Signum(child: Expression) extends MathematicalExpressionForDouble(math.signum, "SIGNUM")
    • case class ISignum(child: Expression) extends MathematicalExpressionForInt(math.signum, "ISIGNUM")
    • case class FSignum(child: Expression) extends MathematicalExpressionForFloat(math.signum, "FSIGNUM")
    • case class LSignum(child: Expression) extends MathematicalExpressionForLong(math.signum, "LSIGNUM")
    • case class ToDegrees(child: Expression)
    • case class ToRadians(child: Expression)
    • case class Log(child: Expression) extends MathematicalExpressionForDouble(math.log, "LOG")
    • case class Log10(child: Expression) extends MathematicalExpressionForDouble(math.log10, "LOG10")
    • case class Log1p(child: Expression) extends MathematicalExpressionForDouble(math.log1p, "LOG1P")
    • case class Exp(child: Expression) extends MathematicalExpressionForDouble(math.exp, "EXP")
    • case class Expm1(child: Expression) extends MathematicalExpressionForDouble(math.expm1, "EXPM1")
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 29, 2015

Test build #730 has started for PR 5740 at commit e54d79e.

@mengxr
Copy link
Contributor

mengxr commented Apr 29, 2015

@jkbradley Let's try to keep this PR minimal. We discussed tol vs. convergenceTol in the elastic net PR and tol seems to be sufficient, which is also used in scikit-learn and MATLAB as the param name for convergence tolerance.

@jkbradley
Copy link
Member Author

@mengxr I missed the discussion on the elastic net PR. The name "tol" just seems very obscure for people who haven't worked with optimization much. I'll remove it though.

@jkbradley
Copy link
Member Author

updated

@SparkQA
Copy link

SparkQA commented Apr 29, 2015

Test build #31301 has finished for PR 5740 at commit f148330.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Param[T] (val parent: Params, val name: String, val doc: String, val isValid: T => Boolean)
    • class DoubleParam(parent: Params, name: String, doc: String, isValid: Double => Boolean)
    • class IntParam(parent: Params, name: String, doc: String, isValid: Int => Boolean)
    • class FloatParam(parent: Params, name: String, doc: String, isValid: Float => Boolean)
    • class LongParam(parent: Params, name: String, doc: String, isValid: Long => Boolean)
    • class BooleanParam(parent: Params, name: String, doc: String) // No need for isValid
    • case class ParamPair[T](param: Param[T], value: T)
  • This patch adds the following new dependencies:
    • spark-unsafe_2.10-1.4.0-SNAPSHOT.jar

@mengxr
Copy link
Contributor

mengxr commented Apr 29, 2015

@jkbradley Could you merge the master branch? The changes to LinearRegression documentation may conflict with #5767. If there are documentation updates, we can do that in a separate PR.

@jkbradley
Copy link
Member Author

Hopefully that did it.

@SparkQA
Copy link

SparkQA commented Apr 30, 2015

Test build #31343 has finished for PR 5740 at commit ad9c6c1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class Word2Vec extends Estimator[Word2VecModel] with Word2VecBase
    • class Param[T] (val parent: Params, val name: String, val doc: String, val isValid: T => Boolean)
    • class DoubleParam(parent: Params, name: String, doc: String, isValid: Double => Boolean)
    • class IntParam(parent: Params, name: String, doc: String, isValid: Int => Boolean)
    • class FloatParam(parent: Params, name: String, doc: String, isValid: Float => Boolean)
    • class LongParam(parent: Params, name: String, doc: String, isValid: Long => Boolean)
    • class BooleanParam(parent: Params, name: String, doc: String) // No need for isValid
    • case class ParamPair[T](param: Param[T], value: T)
  • This patch does not change any dependencies.

@mengxr
Copy link
Contributor

mengxr commented Apr 30, 2015

LGTM. Merged into master. Thanks!

@asfgit asfgit closed this in 114bad6 Apr 30, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
Main change: Added isValid field to Param.  Modified all usages to use isValid when relevant.  Added helper methods in ParamValidate.

Also overrode Params.validate() in:
* CrossValidator + model
* Pipeline + model

I made a few updates for the elastic net patch:
* I changed "tol" to "convergenceTol"
* I added some documentation

This PR is Scala + Java only.  Python will be in a follow-up PR.

CC: mengxr

Author: Joseph K. Bradley <[email protected]>

Closes apache#5740 from jkbradley/enforce-validate and squashes the following commits:

ad9c6c1 [Joseph K. Bradley] re-generated sharedParams after merging with current master
76415e8 [Joseph K. Bradley] reverted convergenceTol to tol
af62f4b [Joseph K. Bradley] Removed changes to SparkBuild, python linalg.  Fixed test failures.  Renamed ParamValidate to ParamValidators.  Removed explicit type from ParamValidators calls where possible.
bb2665a [Joseph K. Bradley] merged with elastic net pr
ecda302 [Joseph K. Bradley] fix rat tests, plus add a little doc
6895dfc [Joseph K. Bradley] small cleanups
069ac6d [Joseph K. Bradley] many cleanups
928fb84 [Joseph K. Bradley] Maybe done
a910ac7 [Joseph K. Bradley] still workin
6d60e2e [Joseph K. Bradley] Still workin
b987319 [Joseph K. Bradley] Partly done with adding checks, but blocking on adding checking functionality to Param
dbc9fb2 [Joseph K. Bradley] merged with master.  enforcing Params.validate
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
Main change: Added isValid field to Param.  Modified all usages to use isValid when relevant.  Added helper methods in ParamValidate.

Also overrode Params.validate() in:
* CrossValidator + model
* Pipeline + model

I made a few updates for the elastic net patch:
* I changed "tol" to "convergenceTol"
* I added some documentation

This PR is Scala + Java only.  Python will be in a follow-up PR.

CC: mengxr

Author: Joseph K. Bradley <[email protected]>

Closes apache#5740 from jkbradley/enforce-validate and squashes the following commits:

ad9c6c1 [Joseph K. Bradley] re-generated sharedParams after merging with current master
76415e8 [Joseph K. Bradley] reverted convergenceTol to tol
af62f4b [Joseph K. Bradley] Removed changes to SparkBuild, python linalg.  Fixed test failures.  Renamed ParamValidate to ParamValidators.  Removed explicit type from ParamValidators calls where possible.
bb2665a [Joseph K. Bradley] merged with elastic net pr
ecda302 [Joseph K. Bradley] fix rat tests, plus add a little doc
6895dfc [Joseph K. Bradley] small cleanups
069ac6d [Joseph K. Bradley] many cleanups
928fb84 [Joseph K. Bradley] Maybe done
a910ac7 [Joseph K. Bradley] still workin
6d60e2e [Joseph K. Bradley] Still workin
b987319 [Joseph K. Bradley] Partly done with adding checks, but blocking on adding checking functionality to Param
dbc9fb2 [Joseph K. Bradley] merged with master.  enforcing Params.validate
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Main change: Added isValid field to Param.  Modified all usages to use isValid when relevant.  Added helper methods in ParamValidate.

Also overrode Params.validate() in:
* CrossValidator + model
* Pipeline + model

I made a few updates for the elastic net patch:
* I changed "tol" to "convergenceTol"
* I added some documentation

This PR is Scala + Java only.  Python will be in a follow-up PR.

CC: mengxr

Author: Joseph K. Bradley <[email protected]>

Closes apache#5740 from jkbradley/enforce-validate and squashes the following commits:

ad9c6c1 [Joseph K. Bradley] re-generated sharedParams after merging with current master
76415e8 [Joseph K. Bradley] reverted convergenceTol to tol
af62f4b [Joseph K. Bradley] Removed changes to SparkBuild, python linalg.  Fixed test failures.  Renamed ParamValidate to ParamValidators.  Removed explicit type from ParamValidators calls where possible.
bb2665a [Joseph K. Bradley] merged with elastic net pr
ecda302 [Joseph K. Bradley] fix rat tests, plus add a little doc
6895dfc [Joseph K. Bradley] small cleanups
069ac6d [Joseph K. Bradley] many cleanups
928fb84 [Joseph K. Bradley] Maybe done
a910ac7 [Joseph K. Bradley] still workin
6d60e2e [Joseph K. Bradley] Still workin
b987319 [Joseph K. Bradley] Partly done with adding checks, but blocking on adding checking functionality to Param
dbc9fb2 [Joseph K. Bradley] merged with master.  enforcing Params.validate
@jkbradley jkbradley deleted the enforce-validate branch July 25, 2016 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants