Skip to content

Conversation

smiklosovic
Copy link

Thanks for sending a pull request! Here are some tips if you're new here:

  • Ensure you have added or run the appropriate tests for your PR.
  • Be sure to keep the PR description updated to reflect all changes.
  • Write your PR title to summarize what this PR proposes.
  • If possible, provide a concise example to reproduce the issue for a faster review.
  • Read our contributor guidelines
  • If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira

@smiklosovic smiklosovic force-pushed the CASSANDRA-20499 branch 3 times, most recently from bbddb7a to 3c436c7 Compare March 31, 2025 14:39
bdeggleston and others added 27 commits April 17, 2025 11:59
Patch by Blake Eggleston; Reviewed by Ariel Weisberg for CASSANDRA-19926
Patch by Blake Eggleston; Reviewed by Ariel Weisberg for CASSANDRA-19834
…ld have been fixed in CASSANDRA-19847 as it was fixed on Cassadnra trunk
…mputeDeadline like the rest of the code, and Accord timeout MUST be less than user timeout

Rebase fixup: when a local keyspace is being open but it isnt present return null so error msg can be provided
Rebase fixup: improved metrics error msg when the exception doesnt match what is expected
Rebase improvement: when we see a timeout or preempt use the new vtable to show the status cross the cluster
Rebase improvement: Cluster.checkForThreadLeaks now groups similar stack traces to make the output less dense
Patch by Blake Eggleston; Reviewed by David Capwell for CASSANDRA-19920
Patch by Blake Eggleston; Reviewed by David Capwell for CASSANDRA-19940

Changes:
Increase accord repair range splitting
Streamline table metadata fetching - removes some unnecessary abstraction from the table metadata lookup path
Remote unnecessary set building when building lists of overlapping keys
Add separate recover delay for repair and increase default recover delay
…e amount of conflicts resolution necessary for future rebasing:

(Accord): C* stores table in Range which will cause ranges to be removed from Accord when DROP TABLE is performed
patch by David Capwell, Sam Tunnicliffe; reviewed by Sam Tunnicliffe for CASSANDRA-18675

CEP-15: (Accord) sequence EpochReady.coordinating to allow syncComplete to be learned from newer epochs
patch by David Capwell; reviewed by Alex Petrov, Blake Eggleston for CASSANDRA-19769
patch by Aleksey Yeschenko; reviewed by Benedict Elliott Smith for
CASSANDRA-19952
  * reconstruct CFK, TFK, progressLog
  * migrate CommandStore collection state from Accord table to the log
  * make memtable writes non-durable; reconstruct memtable state from Writes

Patch by Alex Petrov and Benedict Elliott Smith; reviewed by Benedict Elliott Smith and Alex Petrov for CASSANDRA-19869
Patch by Alex Petrov; reviewed by Aleksey Yeshchenko and Benedict Elliott Smith for CASSANDRA-19877
patch by David Capwell; reviewed by Alex Petrov for CASSANDRA-19969
patch by Alex Petrov, Ariel Weisberg, Benedict Elliott Smith, Blake Eggleston and David Capwell
ninja: fix NPE
disable ephemeral reads
don't load range commands that are redundant, and load least possible
use MISC verb handler for maintenance tasks
…hutdown until after MS.

Wake up segment prepared after shutting down allocator, as no new segments will ever be allocated.

Shut down flusher slightly differently: we do not signal from fsync complete, since all blocks should have been fsynced by then, but we will add invariant check to notice runaway threads.

Wait for quiescense

Truncate blocking

Wait for scheduler shutdown before shutting down command store

Shut down accord after shutting down messaging

Truncate caches before replay
split JournalKey in journal table so we can index it
reorder journal fields so we can easily index on route (when present)
use Message.expiresAtNanos for callback expiration
do not notify slow for range barriers

Accord: Do not contact faulty replicas, and promptly report slow replies for preaccept/read. Do not wait for stale or left nodes for durability.
…edup RoutingKey tableId; avoid calculating rejectsFastPath in more cases; delay retry of fetchMajorityDeps; fix SetShardDurable marking shards durable
… to ReconfigureAccordFastPath so the TCM logs/table gives the debug info needed
bschoening and others added 29 commits May 23, 2025 17:49
…latforms

While we do not support Windows as such (at least on server), reviewers evaluated that this might be fixed
as the gains (Windows users using CQLSH to connect to Cassandra running on supported platforms) are justified.

patch by Brad Schoening; reviewed by Brandon Williams, Josh McKenzie for CASSANDRA-20478
…c pages

 patch by Mick Semb Wever; reviewed by Štefan Miklošovič for CASSANDRA-20678
* cassandra-5.0:
  Remove auto-installation of golang when generating native protocol doc pages
…size tracking is per column

patch by Caleb Rackliffe; reviewed by David Capwell for CASSANDRA-20668
* cassandra-5.0:
  Avoid lambda usage in TrieMemoryIndex range queries and ensure queue size tracking is per column
patch by Pranav Shenoy; reviewed by Branimir Lambov, Claude Warren, David Capwell for CASSANDRA-20398
Updates SystemKeyspace.writePreparedStatement to accept a timestamp
associated with the Prepared creation time. Using this timestamp
will ensure that an INSERT into system.prepared_statements will
always precede the timestamp for the same Prepared in
SystemKeyspace.removePreparedStatement.

This is needed because Caffeine 2.9.2 may evict an entry as soon
as it is inserted if the maximum weight of the cache is exceeded
causing the DELETE to be executed before the INSERT.

Additionally, any clusters currently experiencing a leaky
system.prepared_statements table from this bug may struggle to
bounce into a version with this fix as
SystemKeyspace.loadPreparedPreparedStatements currently does
not paginate the query to system.prepared_statements, causing heap
OOMs.  To fix this this patch adds pagination at 5000 rows and
aborts loading once the cache size is loaded. This should allow
nodes to come up and delete older prepared statements that may no
longer be used as the cache fills up (which should happen immediately).

This patch does not address the issue of Caffeine immediately evicting
a prepared statement, however it will prevent the
system.prepared_statements table from growing unbounded.  For most users
this should be adequate, as the cache should only be filled when there
are erroneously many unique prepared statements. In such a case we can
expect that clients will constantly prepare statements regardless
of whether or not the cache is evicting statements.

patch by Andy Tolbert; reviewed by Berenguer Blasi and Caleb Rackliffe for CASSANDRA-19703
* cassandra-4.0:
  Ensure prepared_statement INSERT timestamp precedes eviction DELETE
* cassandra-4.1:
  Ensure prepared_statement INSERT timestamp precedes eviction DELETE
* cassandra-5.0:
  Ensure prepared_statement INSERT timestamp precedes eviction DELETE
…t read protection reads

patch by Caleb Rackliffe; reviewed by Blake Eggleston and Zhao Yang for CASSANDRA-20639
* cassandra-5.0:
  Ensure replica filtering protection does not trigger unnecessary short read protection reads
Patch by Venkata Harikrishna Nukala; reviewed by Marcus Eriksson and Sam
Tunnicliffe for CASSANDRA-18509
 - cfk pruning+prebootstrap=invalid future dependency
 - exclude retired ranges when filtering RX stillTouches
 - propagate uses incorrect lowEpoch when fetch finds additional owned/touched ranges
 - node.withEpoch should callback with TopologyRetiredException, not throw
 - Recovery can race with durable-applied pruning; must not send durable unless latest ballot on apply
 - removeRedundantDependencies was not slicing pre-bootstrap range calculation to participating ranges
 - NPE in TopologyManager.atLeast caused by referencing an epoch that has been GC'd
 - use journal durableBeforePersister in burn test, not NOOP_PERSISTER
 - ServerUtils.cleanupDirectory use tryDeleteRecursive
 - FsyncRunnable shutdown
 - fix NPE in AccordJournalBurnTest

patch by Benedict; reviewed by Alex Petrov for CASSANDRA-20688
…s) to 4.1 and 5.0

patch by Ariel Weisberg; reviewed by Benedict Elliott Smith for CASSANDRA-20585
Patch by Ariel Weisberg and Yuqi Yan; Reviewed by Marcus Eriksson for CASSANDRA-20513

Co-authored-by: Yuqi Yan <[email protected]>
patch by Dmitry Konstantinov; reviewed by Michael Semb Wever, Stefan Miklosovic for CASSANDRA-20681
patch by Ling Mao; reviewed by Stefan Miklosovic, Maxim Muzafarov for CASSANDRA-20499

Co-authored-by: Stefan Miklosovic <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.