-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-5894][ML] Add polynomial mapper #5245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #29351 has finished for PR 5245 at commit
|
|
Test build #29401 has finished for PR 5245 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to be more specific about what this transformer does. into a larger one is not sufficient to describe the transformer.
|
Test build #30008 has finished for PR 5245 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mengxr I notice that it is not a feasible solution. We have to compute the real indices of the sparse vector. I'll fix it ASAP.
|
Test build #30155 has finished for PR 5245 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
choose(n + 1, k) - 1?
|
@mengxr As we discussed, I fix test suite and left other pieces untouched. I will refine all of them after you verify the correctness. |
|
Test build #30760 has finished for PR 5245 at commit
|
|
@yinxusen Send you a PR at yinxusen#3, which implements another approach. Please let me know how you think. |
|
Test build #30826 has finished for PR 5245 at commit
|
|
@mengxr I merge your modifications. I think it has several benefits:
Also, the class name changed to |
|
Test build #30829 has finished for PR 5245 at commit
|
|
@yinxusen I'm not sure whether it is faster or not. That's why I put the new approach side by side. Please help test the performance. Thanks! |
|
@mengxr I do some tests on these two versions, here is the result log: (You can see my code here.) sbt "mllib/run-main org.apache.spark.ml.feature.PolynomialMapper" 2>&1>test.log
We can see that the two methods have similar performances. |
|
Thanks for testing! LGTM and I've merged this into master. I will send a follow-up PR to remove |
See [SPARK-5894](https://issues.apache.org/jira/browse/SPARK-5894). Author: Xusen Yin <[email protected]> Author: Xiangrui Meng <[email protected]> Closes apache#5245 from yinxusen/SPARK-5894 and squashes the following commits: dc461a6 [Xusen Yin] merge polynomial expansion v2 6d0c3cc [Xusen Yin] Merge branch 'SPARK-5894' of https://github.com/mengxr/spark into mengxr-SPARK-5894 57bfdd5 [Xusen Yin] Merge branch 'master' into SPARK-5894 3d02a7d [Xusen Yin] Merge branch 'master' into SPARK-5894 a067da2 [Xiangrui Meng] a new approach for poly expansion 0789d81 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-5894 4e9aed0 [Xusen Yin] fix test suite 95d8fb9 [Xusen Yin] fix sparse vector indices 8d39674 [Xusen Yin] fix sparse vector expansion error 5998dd6 [Xusen Yin] fix dense vector fillin fa3ade3 [Xusen Yin] change the functional code into imperative one to speedup b70e7e1 [Xusen Yin] remove useless case class 6fa236f [Xusen Yin] fix vector slice error daff601 [Xusen Yin] fix index error of sparse vector 6bd0a10 [Xusen Yin] merge repeated features 419f8a2 [Xusen Yin] need to merge same columns 4ebf34e [Xusen Yin] add test suite of polynomial expansion 372227c [Xusen Yin] add polynomial expansion
See [SPARK-5894](https://issues.apache.org/jira/browse/SPARK-5894). Author: Xusen Yin <[email protected]> Author: Xiangrui Meng <[email protected]> Closes apache#5245 from yinxusen/SPARK-5894 and squashes the following commits: dc461a6 [Xusen Yin] merge polynomial expansion v2 6d0c3cc [Xusen Yin] Merge branch 'SPARK-5894' of https://github.com/mengxr/spark into mengxr-SPARK-5894 57bfdd5 [Xusen Yin] Merge branch 'master' into SPARK-5894 3d02a7d [Xusen Yin] Merge branch 'master' into SPARK-5894 a067da2 [Xiangrui Meng] a new approach for poly expansion 0789d81 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-5894 4e9aed0 [Xusen Yin] fix test suite 95d8fb9 [Xusen Yin] fix sparse vector indices 8d39674 [Xusen Yin] fix sparse vector expansion error 5998dd6 [Xusen Yin] fix dense vector fillin fa3ade3 [Xusen Yin] change the functional code into imperative one to speedup b70e7e1 [Xusen Yin] remove useless case class 6fa236f [Xusen Yin] fix vector slice error daff601 [Xusen Yin] fix index error of sparse vector 6bd0a10 [Xusen Yin] merge repeated features 419f8a2 [Xusen Yin] need to merge same columns 4ebf34e [Xusen Yin] add test suite of polynomial expansion 372227c [Xusen Yin] add polynomial expansion
See SPARK-5894.