Skip to content

Commit 0784e02

Browse files
committed
[SPARK-13355][MLLIB] replace GraphImpl.fromExistingRDDs by Graph.apply
`GraphImpl.fromExistingRDDs` expects preprocessed vertex RDD as input. We call it in LDA without validating this requirement. So it might introduce errors. Replacing it by `Graph.apply` would be safer and more proper because it is a public API. The tests still pass. So maybe it is safe to use `fromExistingRDDs` here (though it doesn't seem so based on the implementation) or the test cases are special. jkbradley ankurdave Author: Xiangrui Meng <[email protected]> Closes #11226 from mengxr/SPARK-13355. (cherry picked from commit 764ca18) Signed-off-by: Xiangrui Meng <[email protected]>
1 parent d31854d commit 0784e02

File tree

1 file changed

+1
-2
lines changed

1 file changed

+1
-2
lines changed

mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,6 @@ import breeze.stats.distributions.{Gamma, RandBasis}
2525

2626
import org.apache.spark.annotation.{DeveloperApi, Since}
2727
import org.apache.spark.graphx._
28-
import org.apache.spark.graphx.impl.GraphImpl
2928
import org.apache.spark.mllib.impl.PeriodicGraphCheckpointer
3029
import org.apache.spark.mllib.linalg.{DenseVector, Matrices, SparseVector, Vector, Vectors}
3130
import org.apache.spark.rdd.RDD
@@ -186,7 +185,7 @@ final class EMLDAOptimizer extends LDAOptimizer {
186185
graph.aggregateMessages[(Boolean, TopicCounts)](sendMsg, mergeMsg)
187186
.mapValues(_._2)
188187
// Update the vertex descriptors with the new counts.
189-
val newGraph = GraphImpl.fromExistingRDDs(docTopicDistributions, graph.edges)
188+
val newGraph = Graph(docTopicDistributions, graph.edges)
190189
graph = newGraph
191190
graphCheckpointer.update(newGraph)
192191
globalTopicTotals = computeGlobalTopicTotals()

0 commit comments

Comments
 (0)