Skip to content

Conversation

adelapena
Copy link

When SAI's TopKProcessor created PK-sorted in-memory row iterators, it wasn't preserving the actual column information from the base row iterator. Instead, it was returning the columns of the base table, leading to incorrect column selection behavior.

The fix consists on copying the column selection of the original iterator into PartitionInfo, so the PK-sorted iterator can return that same column selection.

@adelapena adelapena self-assigned this Aug 14, 2025
@adelapena adelapena added the bug Something isn't working label Aug 14, 2025
Copy link

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes column selection behavior in SAI-ordered queries by ensuring that column information is properly preserved when creating PK-sorted in-memory row iterators. The TopKProcessor was incorrectly returning base table columns instead of the actual column selection from the original iterator.

  • Modified PartitionInfo to include and preserve column selection information
  • Updated InMemoryUnfilteredPartitionIterator to use preserved columns instead of base table columns
  • Added comprehensive tests for column selection functionality

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/java/org/apache/cassandra/index/sai/utils/PartitionInfo.java Enhanced to store and preserve column selection from original iterators
src/java/org/apache/cassandra/index/sai/utils/InMemoryUnfilteredPartitionIterator.java Fixed to use preserved columns instead of base table columns
test/unit/org/apache/cassandra/index/sai/utils/PartitionInfoTest.java Added unit tests for PartitionInfo functionality
test/unit/org/apache/cassandra/index/sai/cql/ColumnSelectionTest.java Added integration tests for column selection with SAI indexes
src/java/org/apache/cassandra/db/Columns.java Minor comment formatting change

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

public void serializeSubset(Collection<ColumnMetadata> columns, Columns superset, DataOutputPlus out) throws IOException
{
/**
/*
Copy link
Preview

Copilot AI Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change from '/**' to '/*' converts a Javadoc comment to a regular block comment, which removes it from API documentation. This appears unrelated to the column selection fix and may be unintentional.

Suggested change
/*
/**

Copilot uses AI. Check for mistakes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same comment in my draft review :P

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess Copilot didn't consider the issue description. The bug, albeit produced somewhere else, affects the logic described here, and the trace of the exception points to this explanation. So I think it's related.

Copy link

@ekaterinadimitrova2 ekaterinadimitrova2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Sonar did not finish on the last CI run. Happened to me before. Maybe when you rebase and push it will finish. Until then, even with my approval, the PR cannot be merged. testOverwriteWithTTL failure seems to me as unrelated to this patch. But we would want to open a ticket for it.

public void serializeSubset(Collection<ColumnMetadata> columns, Columns superset, DataOutputPlus out) throws IOException
{
/**
/*

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change?

public void serializeSubset(Collection<ColumnMetadata> columns, Columns superset, DataOutputPlus out) throws IOException
{
/**
/*

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same comment in my draft review :P

public class ColumnSelectionTest extends VectorTester
{
/**
* Tests that we can select any column in any table with SAI indexes. See CNDB-14997 for further details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Tests that we can select any column in any table with SAI indexes. See CNDB-14997 for further details.
* Tests that we can select any type of column in any table with SAI indexes. See CNDB-14997 for further details.

@ekaterinadimitrova2
Copy link

You may want to fill also the PR checklist

@adelapena
Copy link
Author

adelapena commented Aug 18, 2025

I'm removing the equals and hashCode methods from PartitionInfo and its associated test since we never use them.

When SAI's TopKProcessor created PK-sorted in-memory row iterators,
it wasn't preserving the actual column information from the base row iterator.
Instead, it was returning the columns of the base table,
leading to incorrect column selection behavior.

The fix consists on copying the column selection of the original iterator
into PartitionInfo, so the PK-sorted iterator can return that same column
selection.
Copy link

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-1944 rejected by Butler


1 regressions found
See build details here


Found 1 new test failures

Test Explanation Runs Upstream
o.a.c.index.sai.cql.types.collections.sets.SetVarintTest.test[version=ed,dataset=set,wide=false,scenario=POST_BUILD_QUERY] (compression) REGRESSION 🔴🔵 0 / 18

Found 1 known test failures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants