Skip to content

Conversation

@ekaterinadimitrova2
Copy link

What is the issue

...
Issue: GenericOrderByTest.testOrderBy occasionally fails in slow CI environments with INDEX_BUILD_IN_PROGRESS errors.

Root Cause: Tests were using waitForIndexQueryable() which only checks index status from node1's perspective. When tests query from multiple coordinators, gossip propagation delays mean other nodes may still see indexes as building even though node1 reports them as queryable.

What does this PR fix and why was it fixed

...
Fix flaky SAI tests by ensuring index queryability on all coordinators

Tests querying from multiple nodes now wait for gossip propagation to all
coordinators, preventing INDEX_BUILD_IN_PROGRESS failures in slow CI.

Deprecate ambiguous waitForIndexQueryable() methods and add coordinator-aware
alternatives. Update affected tests to use waitForIndexQueryableOnAllNodes().

I opted in for the deprecation as we use those methods in too many test classes. Probably better in time to move gradually to the new methods when working in those parts of the codebase. I can also produce the noisy patch to just switch to the new methods, but that should be in a separate commit or even PR maybe. Looking forward to the reviewer's feedback.

@github-actions
Copy link

github-actions bot commented Nov 15, 2025

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

@ekaterinadimitrova2
Copy link
Author

I ran locally successfully the tests that were affected, so I do not expect any surprise from CI.
No CNDB PR needed as we do not touch prod code at all.

This is ready for review.

Copy link

@adelapena adelapena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, and indeed it's a good catch. My only suggestion is about the deprecation of the old method using only the first node.

@ekaterinadimitrova2
Copy link
Author

Removed the deprecation in favor of renaming the method across the board

Copy link

@adelapena adelapena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me, +1.

@ekaterinadimitrova2 ekaterinadimitrova2 marked this pull request as draft November 18, 2025 22:25
@ekaterinadimitrova2 ekaterinadimitrova2 marked this pull request as ready for review November 18, 2025 22:25
@ekaterinadimitrova2
Copy link
Author

There was some GH incident yesterday and all CI did not look good. I rebased the branch and restarted CI. Pending commit on CI successful completion.

@sonarqubecloud
Copy link

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-2127 rejected by Butler


3 regressions found
See build details here


Found 3 new test failures

Test Explanation Runs Upstream
o.a.c.index.sai.cql.VectorCompaction100dTest.testZeroOrOneToManyCompaction[dc true] NEW 🔴 0 / 17
o.a.c.index.sai.cql.VectorSiftSmallTest.testSiftSmall[dc false] NEW 🔴 0 / 17
o.a.c.tools.TopPartitionsTest.testServiceTopPartitionsSingleTable (compression) REGRESSION 🔴🔵 0 / 17

No known test failures found

@ekaterinadimitrova2
Copy link
Author

I opened a ticket for TopPartitionsTest and I linked it in Butler.
The other two failures are timeouts which I hope won't pop up again after we merged below PR today. https://github.com/riptano/jenkins-pipeline-lib/pull/234

Neither test of those failing is one touched in this PR. Moving forward with merge. Thanks for the review.

@ekaterinadimitrova2 ekaterinadimitrova2 merged commit 64dbb98 into main Nov 19, 2025
488 of 496 checks passed
@ekaterinadimitrova2 ekaterinadimitrova2 deleted the c15968-main branch November 19, 2025 19:53
@ekaterinadimitrova2 ekaterinadimitrova2 changed the title CNDB-15968: Fix flaky SAI tests by ensuring index queryability on all coordinators. Deprecate ambiguous waitForIndexqueryable() methods and add coordinator-aware alternatives CNDB-15968: Fix flaky SAI tests by ensuring index queryability on all coordinators. Rename ambiguous waitForIndexqueryable() to waitForIndexQueryableOnFirstNode() and add coordinator-aware alternatives Nov 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants