Skip to content

Conversation

@pravingadakh
Copy link
Contributor

What changes were proposed in this pull request?

Removed duplicated generation of ids in OnlineLDAOptimizer.

How was this patch tested?

tested with existing unit tests.

case v: SparseVector => (v.indices.toList, v.values)
k: Int,
ids: List[Int]): (BDV[Double], BDM[Double]) = {
val cts: Array[Double] = termCounts match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The match here becomes redundant right? both cases return v.values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it looks redundant, but values is not a member of parent trait Vector, it is defined at individual implementation level (i.e. DenseVector and SparseVector).

@srowen
Copy link
Member

srowen commented Apr 5, 2016

A-ha, understood about .values. This looks pretty reasonable. My only question is, does it make sense conceptually that this method also returns a list of IDs? it doesn't hurt much in practice, and it seems like there's a reasonable argument for it logically. We have one case where the caller needs it after all. Maybe finish this by documenting the three things returned from this method.

@pravingadakh
Copy link
Contributor Author

I could document the returned values, but frankly I have no idea what those values are. I can see the documentation of first returned value gammad in the comments but none for others.

@srowen
Copy link
Member

srowen commented Apr 5, 2016

Ha fair enough. From poking around the internet I'm pretty sure the first is an estimate of gamma, the topic distribution (as the comments say) and the second are sufficient statistics for updating lambda.

@pravingadakh
Copy link
Contributor Author

@jkbradley Could you please flag this PR for test build?

@srowen
Copy link
Member

srowen commented Apr 6, 2016

Jenkins test this please

@SparkQA
Copy link

SparkQA commented Apr 6, 2016

Test build #55113 has finished for PR 12176 at commit fdc8bb0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hhbyyh
Copy link
Contributor

hhbyyh commented Apr 7, 2016

Sean is right about the meanings of the return values.

@srowen
Copy link
Member

srowen commented Apr 10, 2016

@pravingadakh I'll merge if you'll doc the return value of this method. That further justifies the change.

@pravingadakh
Copy link
Contributor Author

@srowen Added @return for the method.

* avoids explicit computation of variational parameter `phi`.
* @see [[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.7566]]
*
* @return A tuple of topic distribution `gammad`, `sstatsd` and `ids`. Latter two are the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really describe what they are, but simply one place they're called. It doesn't need to be elaborate; see the explanations of what the firs two are above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen How about this return doc:

Returns a tuple of gammad - estimate of gamma, the topic distribution, sstatsd - statistics for updating lambda and ids - list of termCounts vector indices

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fine to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen Updated the return doc accordingly.

@srowen
Copy link
Member

srowen commented Apr 13, 2016

@pravingadakh if you'll augment that doc a little bit more I'll merge this.

@jkbradley
Copy link
Member

LGTM too pending the doc update (thanks for that)

@srowen
Copy link
Member

srowen commented Apr 15, 2016

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Apr 15, 2016

Test build #55921 has finished for PR 12176 at commit 8107dc2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Apr 15, 2016

Merged to master

@asfgit asfgit closed this in e249232 Apr 15, 2016
@pravingadakh pravingadakh deleted the SPARK-14370 branch April 15, 2016 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants