Skip to content

Conversation

@jkbradley
Copy link
Member

Changes:

  • Update protected prediction methods, following design doc. <--most interesting change
  • Changed abstract classes for Estimator and Model to be public. Added DeveloperApi tag. (I kept the traits for Estimator/Model Params private.)
  • Changed ProbabilisticClassificationModel method names to use probability instead of probabilities.

CC: @mengxr @shivaram @etrain

@SparkQA
Copy link

SparkQA commented May 5, 2015

Test build #31890 has finished for PR 5913 at commit 15b9957.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented May 5, 2015

Thanks @jkbradley - Change mostly looks good to me. I also notice you've changed the LogisticRegression to not have the custom transform ? Is that not required anymore ?

@jkbradley
Copy link
Member Author

@shivaram The updated ClassificationModel and ProbabilisticClassificationModel protected prediction methods should be more efficient now, so I don't think a specialized one for LogisticRegression is warranted.
Thanks for checking the PR!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findMax -> argmax? I think we can put this in a separate PR or move it to DenseVector only. The problem is with sparse vectors, where we need to return an index with value zero if nonzero elements are negative.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good point. I'll move it to DenseVector. We've often added helper methods in random places, rather than to linear algebra, utils, or the place they really should belong. I prefer we add these methods directly to the relevant place but make them private.

@jkbradley
Copy link
Member Author

Fixed! Any other comments?

@SparkQA
Copy link

SparkQA commented May 6, 2015

Test build #31945 has finished for PR 5913 at commit e9aa0ea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented May 6, 2015

This looks good to me. There are two issues we may need to deal with in the future:

  1. Where to put multi-label classifiers in the class hierarchy?
  2. How to support distributively stored models? Those models could make predictions on individual records if the method call is triggered on the driver node. However, the predict method cannot be used in an RDD closure.

asfgit pushed a commit that referenced this pull request May 6, 2015
Changes:
* Update protected prediction methods, following design doc. **<--most interesting change**
* Changed abstract classes for Estimator and Model to be public.  Added DeveloperApi tag.  (I kept the traits for Estimator/Model Params private.)
* Changed ProbabilisticClassificationModel method names to use probability instead of probabilities.

CC: mengxr shivaram etrain

Author: Joseph K. Bradley <[email protected]>

Closes #5913 from jkbradley/public-dev-api and squashes the following commits:

e9aa0ea [Joseph K. Bradley] moved findMax to DenseVector and renamed to argmax. fixed bug for vector of length 0
15b9957 [Joseph K. Bradley] renamed probabilities to probability in method names
5cda84d [Joseph K. Bradley] regenerated sharedParams
7d1877a [Joseph K. Bradley] Made spark.ml prediction abstractions public.  Organized their prediction methods for efficient computation of multiple output columns.

(cherry picked from commit 1ad04da)
Signed-off-by: Xiangrui Meng <[email protected]>
@mengxr
Copy link
Contributor

mengxr commented May 6, 2015

Merged into master and branch-1.4. Thanks!

@asfgit asfgit closed this in 1ad04da May 6, 2015
@jkbradley
Copy link
Member Author

Made JIRAs for those 2 items:

  • multilabel abstractions: [https://issues.apache.org/jira/browse/SPARK-7409]
  • distributed models: [https://issues.apache.org/jira/browse/SPARK-7412]

jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
Changes:
* Update protected prediction methods, following design doc. **<--most interesting change**
* Changed abstract classes for Estimator and Model to be public.  Added DeveloperApi tag.  (I kept the traits for Estimator/Model Params private.)
* Changed ProbabilisticClassificationModel method names to use probability instead of probabilities.

CC: mengxr shivaram etrain

Author: Joseph K. Bradley <[email protected]>

Closes apache#5913 from jkbradley/public-dev-api and squashes the following commits:

e9aa0ea [Joseph K. Bradley] moved findMax to DenseVector and renamed to argmax. fixed bug for vector of length 0
15b9957 [Joseph K. Bradley] renamed probabilities to probability in method names
5cda84d [Joseph K. Bradley] regenerated sharedParams
7d1877a [Joseph K. Bradley] Made spark.ml prediction abstractions public.  Organized their prediction methods for efficient computation of multiple output columns.
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
Changes:
* Update protected prediction methods, following design doc. **<--most interesting change**
* Changed abstract classes for Estimator and Model to be public.  Added DeveloperApi tag.  (I kept the traits for Estimator/Model Params private.)
* Changed ProbabilisticClassificationModel method names to use probability instead of probabilities.

CC: mengxr shivaram etrain

Author: Joseph K. Bradley <[email protected]>

Closes apache#5913 from jkbradley/public-dev-api and squashes the following commits:

e9aa0ea [Joseph K. Bradley] moved findMax to DenseVector and renamed to argmax. fixed bug for vector of length 0
15b9957 [Joseph K. Bradley] renamed probabilities to probability in method names
5cda84d [Joseph K. Bradley] regenerated sharedParams
7d1877a [Joseph K. Bradley] Made spark.ml prediction abstractions public.  Organized their prediction methods for efficient computation of multiple output columns.
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Changes:
* Update protected prediction methods, following design doc. **<--most interesting change**
* Changed abstract classes for Estimator and Model to be public.  Added DeveloperApi tag.  (I kept the traits for Estimator/Model Params private.)
* Changed ProbabilisticClassificationModel method names to use probability instead of probabilities.

CC: mengxr shivaram etrain

Author: Joseph K. Bradley <[email protected]>

Closes apache#5913 from jkbradley/public-dev-api and squashes the following commits:

e9aa0ea [Joseph K. Bradley] moved findMax to DenseVector and renamed to argmax. fixed bug for vector of length 0
15b9957 [Joseph K. Bradley] renamed probabilities to probability in method names
5cda84d [Joseph K. Bradley] regenerated sharedParams
7d1877a [Joseph K. Bradley] Made spark.ml prediction abstractions public.  Organized their prediction methods for efficient computation of multiple output columns.
@jkbradley jkbradley deleted the public-dev-api branch July 25, 2016 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants