Skip to content

Conversation

@mengxr
Copy link
Contributor

@mengxr mengxr commented Jun 2, 2015

This is just a workaround to a bigger problem. Some pipeline stages may not be effective during prediction, and they should not complain about missing required columns, e.g. StringIndexerModel. @jkbradley

@SparkQA
Copy link

SparkQA commented Jun 2, 2015

Test build #34023 has finished for PR 6595 at commit e112394.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

As long as this is aimed at 1.4.1 (not 1.4.0), should we design a better fix? My suggestion would be to have PipelineStage include a Param specifying whether to use it during transform(). Stages could be used in transform() by default, but certain Transformers could override the default to skip during transform(). PipelineModel could read the Param and handle each stage accordingly.

If that's too big a change for 1.4.1, then this temp fix seems tolerable.

Note: We should document the behavior in the docs.

@SparkQA
Copy link

SparkQA commented Jun 2, 2015

Test build #34025 has finished for PR 6595 at commit 8ee7c7e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor Author

mengxr commented Jun 3, 2015

I checked the transformers we have. Perhaps this is the only one that would operate on target labels, and it is blocking users from making predictions without labels. So it would be nice to merge this fix into branch-1.4, before 1.4.1 is out.

@SparkQA
Copy link

SparkQA commented Jun 3, 2015

Test build #34069 has finished for PR 6595 at commit b6a36b9.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class ElementwiseProduct(val scalingVec: Vector) extends VectorTransformer

@jkbradley
Copy link
Member

LGTM pending tests

@jkbradley
Copy link
Member

test this please

@SparkQA
Copy link

SparkQA commented Jun 3, 2015

Test build #34106 has finished for PR 6595 at commit b6a36b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class ElementwiseProduct(val scalingVec: Vector) extends VectorTransformer

@jkbradley
Copy link
Member

Merging with master and branch-1.4

asfgit pushed a commit that referenced this pull request Jun 3, 2015
…oes not exist

This is just a workaround to a bigger problem. Some pipeline stages may not be effective during prediction, and they should not complain about missing required columns, e.g. `StringIndexerModel`. jkbradley

Author: Xiangrui Meng <[email protected]>

Closes #6595 from mengxr/SPARK-8051 and squashes the following commits:

b6a36b9 [Xiangrui Meng] add doc
f143fd4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-8051
8ee7c7e [Xiangrui Meng] use SparkFunSuite
e112394 [Xiangrui Meng] make StringIndexerModel silent if input column does not exist

(cherry picked from commit 26c9d7a)
Signed-off-by: Joseph K. Bradley <[email protected]>
@asfgit asfgit closed this in 26c9d7a Jun 3, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
…oes not exist

This is just a workaround to a bigger problem. Some pipeline stages may not be effective during prediction, and they should not complain about missing required columns, e.g. `StringIndexerModel`. jkbradley

Author: Xiangrui Meng <[email protected]>

Closes apache#6595 from mengxr/SPARK-8051 and squashes the following commits:

b6a36b9 [Xiangrui Meng] add doc
f143fd4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-8051
8ee7c7e [Xiangrui Meng] use SparkFunSuite
e112394 [Xiangrui Meng] make StringIndexerModel silent if input column does not exist
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
…oes not exist

This is just a workaround to a bigger problem. Some pipeline stages may not be effective during prediction, and they should not complain about missing required columns, e.g. `StringIndexerModel`. jkbradley

Author: Xiangrui Meng <[email protected]>

Closes apache#6595 from mengxr/SPARK-8051 and squashes the following commits:

b6a36b9 [Xiangrui Meng] add doc
f143fd4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-8051
8ee7c7e [Xiangrui Meng] use SparkFunSuite
e112394 [Xiangrui Meng] make StringIndexerModel silent if input column does not exist
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants