Commit 96df929
[SPARK-3190] Avoid overflow in VertexRDD.count()
VertexRDDs with more than 4 billion elements are counted incorrectly due to integer overflow when summing partition sizes. This PR fixes the issue by converting partition sizes to Longs before summing them.
The following code previously returned -10000000. After applying this PR, it returns the correct answer of 5000000000 (5 billion).
```scala
val pairs = sc.parallelize(0L until 500L).map(_ * 10000000)
.flatMap(start => start until (start + 10000000)).map(x => (x, x))
VertexRDD(pairs).count()
```
Author: Ankur Dave <[email protected]>
Closes #2106 from ankurdave/SPARK-3190 and squashes the following commits:
641f468 [Ankur Dave] Avoid overflow in VertexRDD.count()1 parent 3901245 commit 96df929
File tree
1 file changed
+1
-1
lines changed- graphx/src/main/scala/org/apache/spark/graphx
1 file changed
+1
-1
lines changedLines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
108 | 108 | | |
109 | 109 | | |
110 | 110 | | |
111 | | - | |
| 111 | + | |
112 | 112 | | |
113 | 113 | | |
114 | 114 | | |
| |||
0 commit comments