-
Notifications
You must be signed in to change notification settings - Fork 21
CNDB-11666: Batch clusterings into single SAI partition post-filterin… #1884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Checklist before you submit for review
|
I have confirmed that this patch solves the issue, we will try this docker image with Shadow Proxy |
…g reads Port of CASSANDRA-19497. Co-authored-by: Caleb Rackliffe <[email protected]> Co-authored-by: Michael Marshall <[email protected]> Co-authored-by: Andrés de la Peña <[email protected]>
f61de5b
to
5e29022
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Code is the same as #1883 (+ Version#after)
|
❌ Build ds-cassandra-pr-gate/PR-1884 rejected by Butler2 new test failure(s) in 1 builds Found 2 new test failures
No known test failures found |
…g reads (#1884) Port of CASSANDRA-19497. Co-authored-by: Caleb Rackliffe <[email protected]> Co-authored-by: Michael Marshall <[email protected]> Co-authored-by: Andrés de la Peña <[email protected]>
…sult set (#2024) (cherry picked from commit ada025c) Copy of #2023, but targeting `main` ### What is the issue riptano/cndb#15485 ### What does this PR fix and why was it fixed This PR fixes a bug introduced to this branch via #1884. The bug only impacts SAI file format `aa` when the index file was produced via compaction, which is why the modified test simply adds coverage to compact the table and hit the bug. The bug happens when an iterator produces the same partition across two different batch fetches from storage. These keys were not collapsed in the `key.equals(lastKey)` logic because compacted indexes use a row id per row instead of per partition, and the logic in `PrimaryKeyWithSource` considers rows with different row ids to be distinct. However, when we went to materialize a batch from storage, we hit this code: ```java ClusteringIndexFilter clusteringIndexFilter = command.clusteringIndexFilter(firstKey.partitionKey()); if (cfs.metadata().comparator.size() == 0 || firstKey.hasEmptyClustering()) { return clusteringIndexFilter; } else { nextClusterings.clear(); for (PrimaryKey key : keys) nextClusterings.add(key.clustering()); return new ClusteringIndexNamesFilter(nextClusterings, clusteringIndexFilter.isReversed()); } ``` which returned `clusteringIndexFilter` for `aa` because those indexes do not have the clustering information. Therefore, each batch fetched the whole partition (which was subsequently filtered to the proper results), and produced a multiplier effect where we saw `batch` many duplicates. This fix works by comparing partition keys and clustering keys directly, which is a return to the old comparison logic from before #1884. There was actually a discussion about this in the PR to `main`, but unfortunately, we missed this case #1883 (comment). A more proper long term fix might be to remove the logic of creating a `PrimaryKeyWithSource` for AA indexes. However, I preferred this approach because it is essentially a `revert` instead of fixing forward solution.
…sult set (#2023) ### What is the issue riptano/cndb#15485 ### What does this PR fix and why was it fixed This PR fixes a bug introduced to this branch via #1884. The bug only impacts SAI file format `aa` when the index file was produced via compaction, which is why the modified test simply adds coverage to compact the table and hit the bug. The bug happens when an iterator produces the same partition across two different batch fetches from storage. These keys were not collapsed in the `key.equals(lastKey)` logic because compacted indexes use a row id per row instead of per partition, and the logic in `PrimaryKeyWithSource` considers rows with different row ids to be distinct. However, when we went to materialize a batch from storage, we hit this code: ```java ClusteringIndexFilter clusteringIndexFilter = command.clusteringIndexFilter(firstKey.partitionKey()); if (cfs.metadata().comparator.size() == 0 || firstKey.hasEmptyClustering()) { return clusteringIndexFilter; } else { nextClusterings.clear(); for (PrimaryKey key : keys) nextClusterings.add(key.clustering()); return new ClusteringIndexNamesFilter(nextClusterings, clusteringIndexFilter.isReversed()); } ``` which returned `clusteringIndexFilter` for `aa` because those indexes do not have the clustering information. Therefore, each batch fetched the whole partition (which was subsequently filtered to the proper results), and produced a multiplier effect where we saw `batch` many duplicates. This fix works by comparing partition keys and clustering keys directly, which is a return to the old comparison logic from before #1884. There was actually a discussion about this in the PR to `main`, but unfortunately, we missed this case #1883 (comment). A more proper long term fix might be to remove the logic of creating a `PrimaryKeyWithSource` for AA indexes. However, I preferred this approach because it is essentially a `revert` instead of fixing forward solution.
…g reads
(cherry picked from commit ae4f187)
What is the issue
...
What does this PR fix and why was it fixed
...