Skip to content

Conversation

@yinxusen
Copy link
Contributor

See SPARK-5894.

@SparkQA
Copy link

SparkQA commented Mar 28, 2015

Test build #29351 has finished for PR 5245 at commit 6fa236f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class PolynomialMapper extends UnaryTransformer[Vector, Vector, PolynomialMapper]

@SparkQA
Copy link

SparkQA commented Mar 30, 2015

Test build #29401 has finished for PR 5245 at commit b70e7e1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class PolynomialMapper extends UnaryTransformer[Vector, Vector, PolynomialMapper]
  • This patch does not change any dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to be more specific about what this transformer does. into a larger one is not sufficient to describe the transformer.

@SparkQA
Copy link

SparkQA commented Apr 10, 2015

Test build #30008 has finished for PR 5245 at commit 8d39674.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class PolynomialMapper extends UnaryTransformer[Vector, Vector, PolynomialMapper]
  • This patch does not change any dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mengxr I notice that it is not a feasible solution. We have to compute the real indices of the sparse vector. I'll fix it ASAP.

@SparkQA
Copy link

SparkQA commented Apr 13, 2015

Test build #30155 has finished for PR 5245 at commit 95d8fb9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class PolynomialMapper extends UnaryTransformer[Vector, Vector, PolynomialMapper]
  • This patch does not change any dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

choose(n + 1, k) - 1?

@yinxusen
Copy link
Contributor Author

@mengxr As we discussed, I fix test suite and left other pieces untouched. I will refine all of them after you verify the correctness.

@SparkQA
Copy link

SparkQA commented Apr 22, 2015

Test build #30760 has finished for PR 5245 at commit 4e9aed0.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class PolynomialMapper extends UnaryTransformer[Vector, Vector, PolynomialMapper]
  • This patch does not change any dependencies.

@mengxr
Copy link
Contributor

mengxr commented Apr 22, 2015

@yinxusen Send you a PR at yinxusen#3, which implements another approach. Please let me know how you think.

@SparkQA
Copy link

SparkQA commented Apr 23, 2015

Test build #30826 has finished for PR 5245 at commit 57bfdd5.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class PolynomialMapper extends UnaryTransformer[Vector, Vector, PolynomialMapper]
    • abstract class NumericType extends NativeType
  • This patch does not change any dependencies.

@yinxusen
Copy link
Contributor Author

@mengxr I merge your modifications. I think it has several benefits:

  • More elegant and readable;
  • Much faster since we can avoid one choose function call in each iteration.

Also, the class name changed to PolynomialExpansion.

@SparkQA
Copy link

SparkQA commented Apr 23, 2015

Test build #30829 has finished for PR 5245 at commit dc461a6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class PolynomialExpansion extends UnaryTransformer[Vector, Vector, PolynomialExpansion]
  • This patch does not change any dependencies.

@mengxr
Copy link
Contributor

mengxr commented Apr 23, 2015

@yinxusen I'm not sure whether it is faster or not. That's why I put the new approach side by side. Please help test the performance. Thanks!

@yinxusen
Copy link
Contributor Author

@mengxr I wrote a test just now, see here. I will test it ASAP.

@yinxusen
Copy link
Contributor Author

@mengxr I do some tests on these two versions, here is the result log: (You can see my code here.)

sbt "mllib/run-main org.apache.spark.ml.feature.PolynomialMapper" 2>&1>test.log

[info] Testing number of data 1024
[info] Testing dataset degree: 2 mapper: PolynomialMapper-V1 name: denseData
[info] Elapsed time: 48.591317ms
[info] Testing dataset degree: 2 mapper: PolynomialMapper-V1 name: sparseData
[info] Elapsed time: 43.113877ms
[info] Testing dataset degree: 2 mapper: PolynomialMapper-V2 name: denseData
[info] Elapsed time: 38.518744ms
[info] Testing dataset degree: 2 mapper: PolynomialMapper-V2 name: sparseData
[info] Elapsed time: 36.946037ms
[info] Testing dataset degree: 3 mapper: PolynomialMapper-V1 name: denseData
[info] Elapsed time: 34.615637ms
[info] Testing dataset degree: 3 mapper: PolynomialMapper-V1 name: sparseData
[info] Elapsed time: 39.327571ms
[info] Testing dataset degree: 3 mapper: PolynomialMapper-V2 name: denseData
[info] Elapsed time: 35.640954ms
[info] Testing dataset degree: 3 mapper: PolynomialMapper-V2 name: sparseData
[info] Elapsed time: 38.740797ms
[info] Testing dataset degree: 5 mapper: PolynomialMapper-V1 name: denseData
[info] Elapsed time: 37.757011ms
[info] Testing dataset degree: 5 mapper: PolynomialMapper-V1 name: sparseData
[info] Elapsed time: 39.291329ms
[info] Testing dataset degree: 5 mapper: PolynomialMapper-V2 name: denseData
[info] Elapsed time: 34.665687ms
[info] Testing dataset degree: 5 mapper: PolynomialMapper-V2 name: sparseData
[info] Elapsed time: 37.758357ms
[info] Testing dataset degree: 10 mapper: PolynomialMapper-V1 name: denseData
[info] Elapsed time: 33.307436ms
[info] Testing dataset degree: 10 mapper: PolynomialMapper-V1 name: sparseData
[info] Elapsed time: 37.231837ms
[info] Testing dataset degree: 10 mapper: PolynomialMapper-V2 name: denseData
[info] Elapsed time: 34.794309ms
[info] Testing dataset degree: 10 mapper: PolynomialMapper-V2 name: sparseData
[info] Elapsed time: 37.112773ms

[info] Testing number of data 10240
[info] Testing dataset degree: 2 mapper: PolynomialMapper-V1 name: denseData
[info] Elapsed time: 76.447725ms
[info] Testing dataset degree: 2 mapper: PolynomialMapper-V1 name: sparseData
[info] Elapsed time: 98.351862ms
[info] Testing dataset degree: 2 mapper: PolynomialMapper-V2 name: denseData
[info] Elapsed time: 76.17611ms
[info] Testing dataset degree: 2 mapper: PolynomialMapper-V2 name: sparseData
[info] Elapsed time: 99.099883ms
[info] Testing dataset degree: 3 mapper: PolynomialMapper-V1 name: denseData
[info] Elapsed time: 76.661511ms
[info] Testing dataset degree: 3 mapper: PolynomialMapper-V1 name: sparseData
[info] Elapsed time: 99.442798ms
[info] Testing dataset degree: 3 mapper: PolynomialMapper-V2 name: denseData
[info] Elapsed time: 76.607076ms
[info] Testing dataset degree: 3 mapper: PolynomialMapper-V2 name: sparseData
[info] Elapsed time: 99.722276ms
[info] Testing dataset degree: 5 mapper: PolynomialMapper-V1 name: denseData
[info] Elapsed time: 76.337466ms
[info] Testing dataset degree: 5 mapper: PolynomialMapper-V1 name: sparseData
[info] Elapsed time: 99.550001ms
[info] Testing dataset degree: 5 mapper: PolynomialMapper-V2 name: denseData
[info] Elapsed time: 76.633637ms
[info] Testing dataset degree: 5 mapper: PolynomialMapper-V2 name: sparseData
[info] Elapsed time: 98.995122ms
[info] Testing dataset degree: 10 mapper: PolynomialMapper-V1 name: denseData
[info] Elapsed time: 77.281723ms
[info] Testing dataset degree: 10 mapper: PolynomialMapper-V1 name: sparseData
[info] Elapsed time: 100.623104ms
[info] Testing dataset degree: 10 mapper: PolynomialMapper-V2 name: denseData
[info] Elapsed time: 76.444839ms
[info] Testing dataset degree: 10 mapper: PolynomialMapper-V2 name: sparseData
[info] Elapsed time: 99.626543ms

[info] Testing number of data 102400
[info] Testing dataset degree: 2 mapper: PolynomialMapper-V1 name: denseData
[info] Elapsed time: 459.402129ms
[info] Testing dataset degree: 2 mapper: PolynomialMapper-V1 name: sparseData
[info] Elapsed time: 744.510455ms
[info] Testing dataset degree: 2 mapper: PolynomialMapper-V2 name: denseData
[info] Elapsed time: 462.813174ms
[info] Testing dataset degree: 2 mapper: PolynomialMapper-V2 name: sparseData
[info] Elapsed time: 745.74096ms
[info] Testing dataset degree: 3 mapper: PolynomialMapper-V1 name: denseData
[info] Elapsed time: 465.842966ms
[info] Testing dataset degree: 3 mapper: PolynomialMapper-V1 name: sparseData
[info] Elapsed time: 740.870635ms
[info] Testing dataset degree: 3 mapper: PolynomialMapper-V2 name: denseData
[info] Elapsed time: 456.569887ms
[info] Testing dataset degree: 3 mapper: PolynomialMapper-V2 name: sparseData
[info] Elapsed time: 740.952329ms
[info] Testing dataset degree: 5 mapper: PolynomialMapper-V1 name: denseData
[info] Elapsed time: 461.351083ms
[info] Testing dataset degree: 5 mapper: PolynomialMapper-V1 name: sparseData
[info] Elapsed time: 747.656513ms
[info] Testing dataset degree: 5 mapper: PolynomialMapper-V2 name: denseData
[info] Elapsed time: 461.394609ms
[info] Testing dataset degree: 5 mapper: PolynomialMapper-V2 name: sparseData
[info] Elapsed time: 736.373809ms
[info] Testing dataset degree: 10 mapper: PolynomialMapper-V1 name: denseData
[info] Elapsed time: 459.730604ms
[info] Testing dataset degree: 10 mapper: PolynomialMapper-V1 name: sparseData
[info] Elapsed time: 738.605976ms
[info] Testing dataset degree: 10 mapper: PolynomialMapper-V2 name: denseData
[info] Elapsed time: 461.859974ms
[info] Testing dataset degree: 10 mapper: PolynomialMapper-V2 name: sparseData
[info] Elapsed time: 744.532121ms

We can see that the two methods have similar performances.

@mengxr
Copy link
Contributor

mengxr commented Apr 24, 2015

Thanks for testing! LGTM and I've merged this into master. I will send a follow-up PR to remove 1 from output.

@asfgit asfgit closed this in 8509519 Apr 24, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 14, 2015
See [SPARK-5894](https://issues.apache.org/jira/browse/SPARK-5894).

Author: Xusen Yin <[email protected]>
Author: Xiangrui Meng <[email protected]>

Closes apache#5245 from yinxusen/SPARK-5894 and squashes the following commits:

dc461a6 [Xusen Yin] merge polynomial expansion v2
6d0c3cc [Xusen Yin] Merge branch 'SPARK-5894' of https://github.com/mengxr/spark into mengxr-SPARK-5894
57bfdd5 [Xusen Yin] Merge branch 'master' into SPARK-5894
3d02a7d [Xusen Yin] Merge branch 'master' into SPARK-5894
a067da2 [Xiangrui Meng] a new approach for poly expansion
0789d81 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-5894
4e9aed0 [Xusen Yin] fix test suite
95d8fb9 [Xusen Yin] fix sparse vector indices
8d39674 [Xusen Yin] fix sparse vector expansion error
5998dd6 [Xusen Yin] fix dense vector fillin
fa3ade3 [Xusen Yin] change the functional code into imperative one to speedup
b70e7e1 [Xusen Yin] remove useless case class
6fa236f [Xusen Yin] fix vector slice error
daff601 [Xusen Yin] fix index error of sparse vector
6bd0a10 [Xusen Yin] merge repeated features
419f8a2 [Xusen Yin] need to merge same columns
4ebf34e [Xusen Yin] add test suite of polynomial expansion
372227c [Xusen Yin] add polynomial expansion
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
See [SPARK-5894](https://issues.apache.org/jira/browse/SPARK-5894).

Author: Xusen Yin <[email protected]>
Author: Xiangrui Meng <[email protected]>

Closes apache#5245 from yinxusen/SPARK-5894 and squashes the following commits:

dc461a6 [Xusen Yin] merge polynomial expansion v2
6d0c3cc [Xusen Yin] Merge branch 'SPARK-5894' of https://github.com/mengxr/spark into mengxr-SPARK-5894
57bfdd5 [Xusen Yin] Merge branch 'master' into SPARK-5894
3d02a7d [Xusen Yin] Merge branch 'master' into SPARK-5894
a067da2 [Xiangrui Meng] a new approach for poly expansion
0789d81 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-5894
4e9aed0 [Xusen Yin] fix test suite
95d8fb9 [Xusen Yin] fix sparse vector indices
8d39674 [Xusen Yin] fix sparse vector expansion error
5998dd6 [Xusen Yin] fix dense vector fillin
fa3ade3 [Xusen Yin] change the functional code into imperative one to speedup
b70e7e1 [Xusen Yin] remove useless case class
6fa236f [Xusen Yin] fix vector slice error
daff601 [Xusen Yin] fix index error of sparse vector
6bd0a10 [Xusen Yin] merge repeated features
419f8a2 [Xusen Yin] need to merge same columns
4ebf34e [Xusen Yin] add test suite of polynomial expansion
372227c [Xusen Yin] add polynomial expansion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants