Commit 4fdb491
[SPARK-2010] Support for nested data in PySpark SQL
JIRA issue https://issues.apache.org/jira/browse/SPARK-2010
This PR adds support for nested collection types in PySpark SQL, including
array, dict, list, set, and tuple. Example,
```
>>> from array import array
>>> from pyspark.sql import SQLContext
>>> sqlCtx = SQLContext(sc)
>>> rdd = sc.parallelize([
... {"f1" : array('i', [1, 2]), "f2" : {"row1" : 1.0}},
... {"f1" : array('i', [2, 3]), "f2" : {"row2" : 2.0}}])
>>> srdd = sqlCtx.inferSchema(rdd)
>>> srdd.collect() == [{"f1" : array('i', [1, 2]), "f2" : {"row1" : 1.0}},
... {"f1" : array('i', [2, 3]), "f2" : {"row2" : 2.0}}]
True
>>> rdd = sc.parallelize([
... {"f1" : [[1, 2], [2, 3]], "f2" : set([1, 2]), "f3" : (1, 2)},
... {"f1" : [[2, 3], [3, 4]], "f2" : set([2, 3]), "f3" : (2, 3)}])
>>> srdd = sqlCtx.inferSchema(rdd)
>>> srdd.collect() == \
... [{"f1" : [[1, 2], [2, 3]], "f2" : set([1, 2]), "f3" : (1, 2)},
... {"f1" : [[2, 3], [3, 4]], "f2" : set([2, 3]), "f3" : (2, 3)}]
True
```
Author: Kan Zhang <[email protected]>
Closes #1041 from kanzhang/SPARK-2010 and squashes the following commits:
1b2891d [Kan Zhang] [SPARK-2010] minor doc change and adding a TODO
504f27e [Kan Zhang] [SPARK-2010] Support for nested data in PySpark SQL1 parent 716c88a commit 4fdb491
File tree
2 files changed
+40
-11
lines changed- python/pyspark
- sql/core/src/main/scala/org/apache/spark/sql
2 files changed
+40
-11
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
77 | 77 | | |
78 | 78 | | |
79 | 79 | | |
80 | | - | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
81 | 83 | | |
82 | 84 | | |
83 | 85 | | |
84 | 86 | | |
85 | 87 | | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
86 | 99 | | |
87 | 100 | | |
88 | 101 | | |
| |||
413 | 426 | | |
414 | 427 | | |
415 | 428 | | |
| 429 | + | |
416 | 430 | | |
417 | 431 | | |
418 | 432 | | |
| |||
422 | 436 | | |
423 | 437 | | |
424 | 438 | | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
425 | 445 | | |
426 | 446 | | |
427 | 447 | | |
| |||
Lines changed: 19 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
298 | 298 | | |
299 | 299 | | |
300 | 300 | | |
301 | | - | |
| 301 | + | |
302 | 302 | | |
303 | 303 | | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
304 | 321 | | |
305 | | - | |
306 | | - | |
307 | | - | |
308 | | - | |
309 | | - | |
310 | | - | |
311 | | - | |
312 | | - | |
313 | | - | |
| 322 | + | |
314 | 323 | | |
315 | 324 | | |
316 | 325 | | |
| |||
0 commit comments