Skip to content

[lake] The default configuration of the Lake tables that Fluss depends on can be tampered with #1824

@LiebingYu

Description

@LiebingYu

Search before asking

  • I searched in the issues and found nothing similar.

Fluss version

main (development)

Please describe the bug 🐞

When enable a lake (such as Paimon) table for a Fluss table, Fluss will set some default configs for the corresponding Paimon table, such as bucket = -1 for log table without bucket keys. These configs suppose to be set by Fluss while not by users. If the user sets these properties, it will cause problems with the lakehouse synchronization.

For examle:

    @Test
    void test() throws Exception {
        Map<String, String> customProperties = new HashMap<>();
        customProperties.put("k1", "v1");
        customProperties.put("paimon.file.format", "parquet");
        customProperties.put("paimon.bucket", "112");
        customProperties.put("paimon.bucket-key", "log_c1");

        TableDescriptor logTable =
                TableDescriptor.builder()
                        .schema(
                                Schema.newBuilder()
                                        .column("log_c1", DataTypes.INT())
                                        .column("log_c2", DataTypes.STRING())
                                        .build())
                        .property(ConfigOptions.TABLE_DATALAKE_ENABLED, true)
                        .customProperties(customProperties)
                        .distributedBy(3, "log_c1", "log_c2")
                        .build();
        TablePath logTablePath = TablePath.of(DATABASE, "log_table");
        admin.createTable(logTablePath, logTable, false).get();
        Table paimonLogTable =
                paimonCatalog.getTable(Identifier.create(DATABASE, logTablePath.getTableName()));

        System.out.println(paimonLogTable.options());
    }

We expect to get a Paimon table with the same bucket count as Fluss table. But we actually got a Paimon table with 112 buckets.

{bucket=112, path=..., fluss.table.datalake.enabled=true, fluss.table.datalake.format=paimon, partition.legacy-name=false, bucket-key=log_c1, file.format=parquet, fluss.k1=v1}

Solution

We need to prevent users from setting these configs.

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions