-
Notifications
You must be signed in to change notification settings - Fork 21
CNDB-15968: Fix flaky SAI tests by ensuring index queryability on all coordinators. Rename ambiguous waitForIndexqueryable() to waitForIndexQueryableOnFirstNode() and add coordinator-aware alternatives #2127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Checklist before you submit for review
|
test/distributed/org/apache/cassandra/distributed/test/sai/ANNOptionsDistributedTest.java
Outdated
Show resolved
Hide resolved
test/distributed/org/apache/cassandra/distributed/test/sai/IndexHintsDistributedTest.java
Outdated
Show resolved
Hide resolved
|
I ran locally successfully the tests that were affected, so I do not expect any surprise from CI. This is ready for review. |
adelapena
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, and indeed it's a good catch. My only suggestion is about the deprecation of the old method using only the first node.
test/distributed/org/apache/cassandra/distributed/test/sai/SAIUtil.java
Outdated
Show resolved
Hide resolved
test/distributed/org/apache/cassandra/distributed/test/sai/SAIUtil.java
Outdated
Show resolved
Hide resolved
|
Removed the deprecation in favor of renaming the method across the board |
adelapena
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me, +1.
… coordinators. Deprecate ambiguous waitForIndexqueryable() methods and add coordinator-aware alternatives
…eprecation for the old utility method
22bce25 to
c9f9778
Compare
|
There was some GH incident yesterday and all CI did not look good. I rebased the branch and restarted CI. Pending commit on CI successful completion. |
|
❌ Build ds-cassandra-pr-gate/PR-2127 rejected by Butler3 regressions found Found 3 new test failures
No known test failures found |
|
I opened a ticket for Neither test of those failing is one touched in this PR. Moving forward with merge. Thanks for the review. |



What is the issue
...
Issue:
GenericOrderByTest.testOrderByoccasionally fails in slow CI environments withINDEX_BUILD_IN_PROGRESSerrors.Root Cause: Tests were using
waitForIndexQueryable()which only checks index status from node1's perspective. When tests query from multiple coordinators, gossip propagation delays mean other nodes may still see indexes as building even though node1 reports them as queryable.What does this PR fix and why was it fixed
...
Fix flaky SAI tests by ensuring index queryability on all coordinators
Tests querying from multiple nodes now wait for gossip propagation to all
coordinators, preventing
INDEX_BUILD_IN_PROGRESSfailures in slow CI.Deprecate ambiguous
waitForIndexQueryable()methods and add coordinator-awarealternatives. Update affected tests to use
waitForIndexQueryableOnAllNodes().I opted in for the deprecation as we use those methods in too many test classes. Probably better in time to move gradually to the new methods when working in those parts of the codebase. I can also produce the noisy patch to just switch to the new methods, but that should be in a separate commit or even PR maybe. Looking forward to the reviewer's feedback.