CNDB-15485: Fix ResultRetriever key comparison to prevent dupes in result set #2024

michaeljmarshall · 2025-09-29T20:35:02Z

(cherry picked from commit ada025c)

Copy of #2023, but targeting main

What is the issue

https://github.com/riptano/cndb/issues/15485

What does this PR fix and why was it fixed

This PR fixes a bug introduced to this branch via #1884. The bug only impacts SAI file format aa when the index file was produced via compaction, which is why the modified test simply adds coverage to compact the table and hit the bug.

The bug happens when an iterator produces the same partition across two different batch fetches from storage. These keys were not collapsed in the key.equals(lastKey) logic because compacted indexes use a row id per row instead of per partition, and the logic in PrimaryKeyWithSource considers rows with different row ids to be distinct. However, when we went to materialize a batch from storage, we hit this code:

        ClusteringIndexFilter clusteringIndexFilter = command.clusteringIndexFilter(firstKey.partitionKey());
        if (cfs.metadata().comparator.size() == 0 || firstKey.hasEmptyClustering())
        {
            return clusteringIndexFilter;
        }
        else
        {
            nextClusterings.clear();
            for (PrimaryKey key : keys)
                nextClusterings.add(key.clustering());
            return new ClusteringIndexNamesFilter(nextClusterings, clusteringIndexFilter.isReversed());
        }

which returned clusteringIndexFilter for aa because those indexes do not have the clustering information. Therefore, each batch fetched the whole partition (which was subsequently filtered to the proper results), and produced a multiplier effect where we saw batch many duplicates.

This fix works by comparing partition keys and clustering keys directly, which is a return to the old comparison logic from before #1884. There was actually a discussion about this in the PR to main, but unfortunately, we missed this case #1883 (comment).

A more proper long term fix might be to remove the logic of creating a PrimaryKeyWithSource for AA indexes. However, I preferred this approach because it is essentially a revert instead of fixing forward solution.

…sult set (cherry picked from commit ada025c)

github-actions · 2025-09-29T20:35:21Z

sonarqubecloud · 2025-09-29T21:10:43Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
89.5% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

cassci-bot · 2025-09-29T21:15:47Z

✔️ Build ds-cassandra-pr-gate/PR-2024 approved by Butler

Approved by Butler
See build details here

CNDB-15485: Fix ResultRetriever key comparison to prevent dupes in re…

a385205

…sult set (cherry picked from commit ada025c)

michaeljmarshall requested a review from a team September 29, 2025 20:35

michaeljmarshall self-assigned this Sep 29, 2025

adelapena approved these changes Sep 30, 2025

View reviewed changes

michaeljmarshall merged commit 748018e into main Sep 30, 2025
489 of 495 checks passed

michaeljmarshall deleted the cndb-15485-main branch September 30, 2025 20:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CNDB-15485: Fix ResultRetriever key comparison to prevent dupes in result set #2024

CNDB-15485: Fix ResultRetriever key comparison to prevent dupes in result set #2024

michaeljmarshall commented Sep 29, 2025

Uh oh!

github-actions bot commented Sep 29, 2025

Uh oh!

sonarqubecloud bot commented Sep 29, 2025

Uh oh!

cassci-bot commented Sep 29, 2025

Uh oh!

Uh oh!

Uh oh!

CNDB-15485: Fix ResultRetriever key comparison to prevent dupes in result set #2024

CNDB-15485: Fix ResultRetriever key comparison to prevent dupes in result set #2024

Conversation

michaeljmarshall commented Sep 29, 2025

What is the issue

What does this PR fix and why was it fixed

Uh oh!

github-actions bot commented Sep 29, 2025

Checklist before you submit for review

Uh oh!

sonarqubecloud bot commented Sep 29, 2025

Quality Gate passed

Uh oh!

cassci-bot commented Sep 29, 2025

✔️ Build ds-cassandra-pr-gate/PR-2024 approved by Butler

Uh oh!

Uh oh!

Uh oh!