-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Describe the bug
If you register a file to an Iceberg table using a path with an extra prefix before the file name, ClickHouse queries will fail with an error like the following.
Received exception from server (version 25.3.3):
Code: 499. DB::Exception: Received from localhost:9000. DB::Exception: Failed to get object info: No response body.. HTTP response code: 404: (in file/uri rhodges-ice-rest-catalog-demo/btc/transactions_path_bug/data=2025-01-03/part-00000-56a0c524-82b4-494c-9408-990cdc225dc2-c000.snappy.parquet): While executing ParquetBlockInputFormat: While executing IcebergS3(ice.`btc.transactions_path_bug`)Source. (S3_ERROR)
Here's a file that does not cause the problem.
- file: "s3://rhodges-ice-rest-catalog-demo/btc/transactions_path_bug/data/1755576822627-da4cfa498606e91d1be31c1022b0e5583518a228681fb4de2104ce3bc79ed2b2.parquet"
Here's a file that does cause the problem. It has "/merged" as a prefix on the path.
- file: "s3://rhodges-ice-rest-catalog-demo/merged/btc/transactions_path_bug/data=2025-01-03/part-00000-56a0c524-82b4-494c-9408-990cdc225dc2-c000.snappy.parquet"
To Reproduce
This assumes you have an ice catalog set up and can create tables.
Steps to reproduce the behavior:
- Copy a file to a location that imitates where your file will go.
's3://aws-public-blockchain/v1.0/btc/transactions/date=2025-01-03/' \
s3://rhodges-ice-rest-catalog-demo/merged/btc/transactions_path_bug/data=2025-01-03/
- Use an ice command to move the file to a table in the same bucket. Use --force-no-copy so that the file is registered from its current location.
kubectl -n antalya exec -it ice-rest-catalog-0 -- ice insert \
btc.transactions_path_bug -p --force-no-copy --thread-count=10 \
's3://rhodges-ice-rest-catalog-demo/merged/btc/transactions_path_bug/data=2025-01-03/*.parquet'
- Try to query the table in ClickHouse. You'll see the following error.
SELECT
count(),
max(output_value)
FROM ice.`btc.transactions_path_bug`
Received exception from server (version 25.3.3):
Code: 499. DB::Exception: Received from localhost:9000. DB::Exception: Failed to get object info: No response body.. HTTP response code: 404: (in file/uri rhodges-ice-rest-catalog-demo/btc/transactions_path_bug/data=2025-01-03/part-00000-56a0c524-82b4-494c-9408-990cdc225dc2-c000.snappy.parquet): While executing ParquetBlockInputFormat: While executing IcebergS3(ice.`btc.transactions_path_bug`)Source. (S3_ERROR)
Note that the error shows the wrong path to the file: rhodges-ice-rest-catalog-demo/btc/transactions_path_bug/data=2025-01-03/part-00000-56a0c524-82b4-494c-9408-990cdc225dc2-c000.snappy.parquet
. It's missing the /merge prefix.
Key information
Provide relevant runtime details.
- Project Antalya Build Version: 25.3.3.20186
- Cloud provider: AWS EKS
- Ice Version: 0.5.1
Additional context
File paths that don't match the schema.table name in the iceberg table don't seem to trigger this error. It looks like a regex problem with string matching.