-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-14370][MLLIB]removed duplicate generation of ids in OnlineLDAOptimizer #12176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| case v: SparseVector => (v.indices.toList, v.values) | ||
| k: Int, | ||
| ids: List[Int]): (BDV[Double], BDM[Double]) = { | ||
| val cts: Array[Double] = termCounts match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The match here becomes redundant right? both cases return v.values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it looks redundant, but values is not a member of parent trait Vector, it is defined at individual implementation level (i.e. DenseVector and SparseVector).
|
A-ha, understood about |
|
I could document the returned values, but frankly I have no idea what those values are. I can see the documentation of first returned value |
|
Ha fair enough. From poking around the internet I'm pretty sure the first is an estimate of gamma, the topic distribution (as the comments say) and the second are sufficient statistics for updating lambda. |
|
@jkbradley Could you please flag this PR for test build? |
|
Jenkins test this please |
|
Test build #55113 has finished for PR 12176 at commit
|
|
Sean is right about the meanings of the return values. |
|
@pravingadakh I'll merge if you'll doc the return value of this method. That further justifies the change. |
|
@srowen Added |
| * avoids explicit computation of variational parameter `phi`. | ||
| * @see [[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.7566]] | ||
| * | ||
| * @return A tuple of topic distribution `gammad`, `sstatsd` and `ids`. Latter two are the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really describe what they are, but simply one place they're called. It doesn't need to be elaborate; see the explanations of what the firs two are above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen How about this return doc:
Returns a tuple of
gammad- estimate of gamma, the topic distribution,sstatsd- statistics for updating lambda andids- list of termCounts vector indices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds fine to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen Updated the return doc accordingly.
|
@pravingadakh if you'll augment that doc a little bit more I'll merge this. |
|
LGTM too pending the doc update (thanks for that) |
|
Jenkins retest this please |
|
Test build #55921 has finished for PR 12176 at commit
|
|
Merged to master |
What changes were proposed in this pull request?
Removed duplicated generation of
idsin OnlineLDAOptimizer.How was this patch tested?
tested with existing unit tests.