-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Describe the bug
I am seeing the following error when attempting to query a table of hive style partitioned JSON files via datafusion-cli:
Arrow error: Json error: Encountered unmasked nulls in non-nullable StructArray child: Field { name: "a", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }
To Reproduce
I have a table with the following directory structure:
dev@dev:~/arrow-datafusion/test_table$ ls -lhR
.:
total 12K
drwxrwxr-x 2 dev dev 4.0K Oct 12 17:35 'a=2'
drwxrwxr-x 2 dev dev 4.0K Oct 12 17:35 'a=4'
drwxrwxr-x 2 dev dev 4.0K Oct 12 17:35 'a=6'
'./a=2':
total 4.0K
-rw-rw-r-- 1 dev dev 20 Oct 12 17:35 tn0Sfag4abaDm6i2.json
'./a=4':
total 4.0K
-rw-rw-r-- 1 dev dev 20 Oct 12 17:35 tn0Sfag4abaDm6i2.json
'./a=6':
total 4.0K
-rw-rw-r-- 1 dev dev 20 Oct 12 17:35 tn0Sfag4abaDm6i2.jsonAnd the JSON files look like:
dev@dev:~/arrow-datafusion/test_table$ cat a\=2/tn0Sfag4abaDm6i2.json
{"b":"1"}
{"b":"1"}Attempting to query like the following fails:
dev@dev:~/arrow-datafusion$ datafusion-cli
DataFusion CLI v32.0.0
❯ CREATE EXTERNAL TABLE
json_test(a string, b string)
STORED AS json
LOCATION './test_table'
PARTITIONED BY (a);
0 rows in set. Query took 0.001 seconds.
❯ select * from json_test;
Arrow error: Json error: Encountered unmasked nulls in non-nullable StructArray child: Field { name: "a", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }
❯ Querying without the partitions defined works as expected:
dev@dev:~/arrow-datafusion$ datafusion-cli
DataFusion CLI v32.0.0
❯ CREATE EXTERNAL TABLE
json_test(b string)
STORED AS json
LOCATION './test_table';
0 rows in set. Query took 0.001 seconds.
❯ select * from json_test;
+---+
| b |
+---+
| 3 |
| 3 |
| 1 |
| 1 |
| 5 |
| 5 |
+---+
6 rows in set. Query took 0.002 seconds.The exact same table structure DDL works for CSV and parquet files, but not JSON.
Expected behavior
The above json query should work.
Additional context
I discovered this while working on https://github.com/apache/arrow-datafusion/pull/7801/files#diff-0580d65ff5db0c78c1fa4cf693f2567e7d2394923412687560614836401c223f, and there are additional relevant tests in this PR.