Skip to content

Commit 764ca18

Browse files
committed
[SPARK-13355][MLLIB] replace GraphImpl.fromExistingRDDs by Graph.apply
`GraphImpl.fromExistingRDDs` expects preprocessed vertex RDD as input. We call it in LDA without validating this requirement. So it might introduce errors. Replacing it by `Graph.apply` would be safer and more proper because it is a public API. The tests still pass. So maybe it is safe to use `fromExistingRDDs` here (though it doesn't seem so based on the implementation) or the test cases are special. jkbradley ankurdave Author: Xiangrui Meng <[email protected]> Closes #11226 from mengxr/SPARK-13355.
1 parent 72427c3 commit 764ca18

File tree

1 file changed

+1
-2
lines changed

1 file changed

+1
-2
lines changed

mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,6 @@ import breeze.stats.distributions.{Gamma, RandBasis}
2525

2626
import org.apache.spark.annotation.{DeveloperApi, Since}
2727
import org.apache.spark.graphx._
28-
import org.apache.spark.graphx.impl.GraphImpl
2928
import org.apache.spark.mllib.impl.PeriodicGraphCheckpointer
3029
import org.apache.spark.mllib.linalg.{DenseVector, Matrices, SparseVector, Vector, Vectors}
3130
import org.apache.spark.rdd.RDD
@@ -188,7 +187,7 @@ final class EMLDAOptimizer extends LDAOptimizer {
188187
graph.aggregateMessages[(Boolean, TopicCounts)](sendMsg, mergeMsg)
189188
.mapValues(_._2)
190189
// Update the vertex descriptors with the new counts.
191-
val newGraph = GraphImpl.fromExistingRDDs(docTopicDistributions, graph.edges)
190+
val newGraph = Graph(docTopicDistributions, graph.edges)
192191
graph = newGraph
193192
graphCheckpointer.update(newGraph)
194193
globalTopicTotals = computeGlobalTopicTotals()

0 commit comments

Comments
 (0)