-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-4611][MLlib] Implement the efficient vector norm #3462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -261,6 +261,57 @@ object Vectors { | |
| sys.error("Unsupported Breeze vector type: " + v.getClass.getName) | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Returns the p-norm of this vector. | ||
| * @param vector input vector. | ||
| * @param p norm. | ||
| * @return norm in L^p^ space. | ||
| */ | ||
| private[spark] def norm(vector: Vector, p: Double): Double = { | ||
| require(p >= 1.0) | ||
| val values = vector match { | ||
| case dv: DenseVector => dv.values | ||
| case sv: SparseVector => sv.values | ||
| case v => throw new IllegalArgumentException("Do not support vector type " + v.getClass) | ||
| } | ||
| val size = values.size | ||
|
|
||
| if (p == 1) { | ||
| var sum = 0.0 | ||
| var i = 0 | ||
| while (i < size) { | ||
| sum += math.abs(values(i)) | ||
| i += 1 | ||
| } | ||
| sum | ||
| } else if (p == 2) { | ||
| var sum = 0.0 | ||
| var i = 0 | ||
| while (i < size) { | ||
| sum += values(i) * values(i) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since this is a common case and tight loop, avoid the duplicated value lookup? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried that, and JVM will optimize it so no performance difference. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. will try to see if scala generates the same bytecode tomorrow. Maybe scala compiler optimizes it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. They will generate different bytecode, but I don't see the performance difference. Maybe sum += values(i) * values(i)will translate to while val value = values(i)
sum += value * valuewill translate to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh OK, didn't mean to make you spend so much time investigating. It is not optimized I imagine but may be inconsequential even in a tight loop. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just curious, what if you define There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mengxr Having @srowen I'm very curious about this myself. :P I'm using ASM Bytecode Outline plugin in intellij, https://plugins.jetbrains.com/plugin/5918 so I can generate the bytecode by just simple right click. We use it to find couple boxing/unboxing performance issue at Alpine, and it's very useful. |
||
| i += 1 | ||
| } | ||
| math.sqrt(sum) | ||
| } else if (p == Double.PositiveInfinity) { | ||
| var max = 0.0 | ||
| var i = 0 | ||
| while (i < size) { | ||
| val value = math.abs(values(i)) | ||
| if (value > max) max = value | ||
| i += 1 | ||
| } | ||
| max | ||
| } else { | ||
| var sum = 0.0 | ||
| var i = 0 | ||
| while (i < size) { | ||
| sum += math.pow(math.abs(values(i)), p) | ||
| i += 1 | ||
| } | ||
| math.pow(sum, 1.0 / p) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /** | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
p match { ...here? with@switchto ensure it's just a lookup? should be faster even thanifs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, will try tomorrow. But I don't expect too much difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In bytecode, there is no direct
switchoperation. As a result, theswtichor pattern matching will be compiled intoifstatement in the bytecode. See the following examplewill be compiled to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting tangent. What happens if you add
@switch? http://www.scala-lang.org/api/current/index.html#scala.annotation.switch Bytecode should have instructions forswitchstatements that aren't just conditionals, liketableswitch: https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-3.html#jvms-3.10There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha~ It only works if I change type from
DoubletoInt. See the oracle doc you referencedThe Java Virtual Machine's tableswitch and lookupswitch instructions operate only on int data. Because operations on byte, char, or short values are internally promoted to int, a switch whose expression evaluates to one of those types is compiled as though it evaluated to type int.With
I got
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is an interesting discussion ~ :) But maybe more people are familiar with the
if ... else if ... elsestatement. And this is not on the critical path.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. but even with
@switchhere, the code will not be optimized unless p has type ofInt.