Skip to content

Registering a file to Iceberg table with name that is almost identical to table path cause query failure #963

@hodgesrm

Description

@hodgesrm

Describe the bug
If you register a file to an Iceberg table using a path with an extra prefix before the file name, ClickHouse queries will fail with an error like the following.

Received exception from server (version 25.3.3):
Code: 499. DB::Exception: Received from localhost:9000. DB::Exception: Failed to get object info: No response body.. HTTP response code: 404: (in file/uri rhodges-ice-rest-catalog-demo/btc/transactions_path_bug/data=2025-01-03/part-00000-56a0c524-82b4-494c-9408-990cdc225dc2-c000.snappy.parquet): While executing ParquetBlockInputFormat: While executing IcebergS3(ice.`btc.transactions_path_bug`)Source. (S3_ERROR)

Here's a file that does not cause the problem.

    - file: "s3://rhodges-ice-rest-catalog-demo/btc/transactions_path_bug/data/1755576822627-da4cfa498606e91d1be31c1022b0e5583518a228681fb4de2104ce3bc79ed2b2.parquet"

Here's a file that does cause the problem. It has "/merged" as a prefix on the path.

    - file: "s3://rhodges-ice-rest-catalog-demo/merged/btc/transactions_path_bug/data=2025-01-03/part-00000-56a0c524-82b4-494c-9408-990cdc225dc2-c000.snappy.parquet"

To Reproduce
This assumes you have an ice catalog set up and can create tables.

Steps to reproduce the behavior:

  1. Copy a file to a location that imitates where your file will go.
's3://aws-public-blockchain/v1.0/btc/transactions/date=2025-01-03/' \
s3://rhodges-ice-rest-catalog-demo/merged/btc/transactions_path_bug/data=2025-01-03/
  1. Use an ice command to move the file to a table in the same bucket. Use --force-no-copy so that the file is registered from its current location.
kubectl -n antalya exec -it ice-rest-catalog-0 -- ice insert \
 btc.transactions_path_bug -p --force-no-copy --thread-count=10 \
 's3://rhodges-ice-rest-catalog-demo/merged/btc/transactions_path_bug/data=2025-01-03/*.parquet'
  1. Try to query the table in ClickHouse. You'll see the following error.
SELECT
    count(),
    max(output_value)
FROM ice.`btc.transactions_path_bug`

Received exception from server (version 25.3.3):
Code: 499. DB::Exception: Received from localhost:9000. DB::Exception: Failed to get object info: No response body.. HTTP response code: 404: (in file/uri rhodges-ice-rest-catalog-demo/btc/transactions_path_bug/data=2025-01-03/part-00000-56a0c524-82b4-494c-9408-990cdc225dc2-c000.snappy.parquet): While executing ParquetBlockInputFormat: While executing IcebergS3(ice.`btc.transactions_path_bug`)Source. (S3_ERROR)

Note that the error shows the wrong path to the file: rhodges-ice-rest-catalog-demo/btc/transactions_path_bug/data=2025-01-03/part-00000-56a0c524-82b4-494c-9408-990cdc225dc2-c000.snappy.parquet. It's missing the /merge prefix.

Key information
Provide relevant runtime details.

  • Project Antalya Build Version: 25.3.3.20186
  • Cloud provider: AWS EKS
  • Ice Version: 0.5.1

Additional context
File paths that don't match the schema.table name in the iceberg table don't seem to trigger this error. It looks like a regex problem with string matching.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions