[SPARK-53333][SS] Enable StateDataSource with state checkpoint v2 (only readChangeFeed) #52148

dylanwong250 · 2025-08-27T19:20:10Z

What changes were proposed in this pull request?

This PR extends StateDataSource (https://spark.apache.org/docs/latest/streaming/structured-streaming-state-data-source.html) support for state checkpoint v2 format to include the readChangeFeed functionality. This PR now enables users to read change feeds from state stores using checkpoint v2 format by:

Implementing full lineage reconstruction across multiple changelog files using getFullLineage in RocksDBFileManager. This is needed because changelog files only contain lineage from [snapShotVersion, version) and we may need the versions for all changelog files across snapshot boundaries.
Adding support for getStateStoreChangeDataReader to have and use the endVersionStateStoreCkptId parameter. Since we can construct the full lineage to the start version from the last version and endVersionStateStoreCkptId we do not need a startVersionStateStoreCkptId. However when snapshotStartBatchId is implemented startVersionStateStoreCkptId and endVersionStateStoreCkptId will be needed to maintain the current behavior.
Adding an extra parameter to setStoreMetrics to determine whether or not to call store.getStateStoreCheckpointInfo(). If we call this in the abort case in TransformWithStateExec or TransformWithStateInPySparkExec it will throw an exception and we do not want this.

The key enhancement is the ability to read change feeds that span across multiple snapshots by walking backwards through the lineage information embedded in changelog files to construct the complete version history.

NOTE: To read checkpoint v2 state data sources it is required to have "spark.sql.streaming.stateStore.checkpointFormatVersion" -> 2. It is possible to allow reading state data sources arbitrarily based on what is in the CommitLog by relaxing assertion checks but this is left as a future change.

Why are the changes needed?

State checkpoint v2 ("spark.sql.streaming.stateStore.checkpointFormatVersion") introduces a new format for storing state metadata that includes unique identifiers in the file path for each state store. The existing StateDataSource implementation only worked with checkpoint v1 format, making it incompatible with streaming queries using the newer checkpoint format. Only batchId was implemented in #52047.

Does this PR introduce any user-facing change?

Yes.

State Data Source will work when checkpoint v2 is used and the readChangeFeed option is used.

How was this patch tested?

Adds a new test suite RocksDBWithCheckpointV2StateDataSourceChangeDataReaderSuite that reuses the unit tests in RocksDBWithChangelogCheckpointStateDataSourceChangeDataReaderSuite but with checkpoint v2 enabled and adds tests for the case of reading across snapshot boundaries.

testOnly *RocksDBWithCheckpointV2StateDataSourceChangeDataReaderSuite

[info] Total number of tests run: 10
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 10, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

Adds a new test suite StateDataSourceTransformWithStateSuiteCheckpointV2 that reuses the unit tests in StateDataSourceTransformWithStateSuite but with checkpoint v2 enabled.

testOnly *StateDataSourceTransformWithStateSuiteCheckpointV2

Note that the cancelled tests are to not run the tests that use snapshotStartBatchId.

[info] Total number of tests run: 44
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 44, failed 0, canceled 2, ignored 0, pending 0
[info] All tests passed

Adds a new test suite TransformWithStateInitialStateSuiteCheckpointV2 that reuses the unit tests in TransformWithStateInitialStateSuite but with checkpoint v2 enabled.

testOnly *TransformWithStateInitialStateSuiteCheckpointV2

[info] Total number of tests run: 44
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 44, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

Adds a new tests TransformWithStateInPandasWithCheckpointV2Tests and TransformWithStateInPySparkWithCheckpointV2Tests that reuses the unit tests in python that test the State Data Source.

Was this patch authored or co-authored using generative AI tooling?

No

anishshri-db · 2025-08-28T23:41:57Z

...in/scala/org/apache/spark/sql/execution/streaming/operators/stateful/statefulOperators.scala

   * This should be called in that task after the store has been updated.
   */
-  protected def setStoreMetrics(store: StateStore): Unit = {
+  protected def setStoreMetrics(store: StateStore, setCheckpointInfo: Boolean = true): Unit = {


Hmm why do we need this change ?

In setStoreMetrics we call store.getStateStoreCheckpointInfo(). If we call this in the store.abort() case in TransformWithStateExec or TransformWithStateInPySparkExec it will throw an exception since the checkpoint info does not exist since we never committed. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/operators/stateful/transformwithstate/TransformWithStateExec.scala#L343

anishshri-db · 2025-08-28T23:43:45Z

...org/apache/spark/sql/execution/datasources/v2/state/StateDataSourceChangeDataReadSuite.scala

+          .load(tempDir.getAbsolutePath)
+
+        val expectedDf = Seq(
+          Row(0L, "update", Row(3), Row(1), 1),


Can we also track other operations such as insert, update, delete etc ?

Added a test in this suite with a delete in the change feed. StateDataSourceTransformWithStateSuite also has a few tests with append and delete.

anishshri-db · 2025-08-28T23:47:19Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala

+      val prevSmallestVersion = buf.last.version
+      val lineage = getLineageFromChangelogFile(buf.last.version, Some(buf.last.checkpointUniqueId))
+      // lineage array is sorted in increasing order, we need to reverse it
+      val lineageSorted = lineage.filter(_.version >= startVersion).sortBy(_.version).reverse


Can we just pass descending as the negative key or Ordering param ?

Changed to.sortBy(-_.version)

anishshri-db · 2025-08-28T23:47:44Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala

+
+      // to prevent infinite loop if we make no progress, throw an exception
+      if (buf.last.version == prevSmallestVersion) {
+        throw new IllegalStateException(s"Lineage is not complete")


Can we create an error class for this ?

Done. Created INVALID_CHECKPOINT_LINEAGE.

anishshri-db · 2025-08-28T23:48:19Z

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSource.scala

+    }
+
+    if (startOperatorStateUniqueIds.isDefined != endOperatorStateUniqueIds.isDefined) {
+      throw StateDataSourceErrors.internalError(


Just to confirm - this is backed by an error class correct ?

Yes this is. But I changed it to be backed by a new error class STDS_MIXED_CHECKPOINT_FORMAT_VERSIONS_NOT_SUPPORTED

anishshri-db · 2025-08-28T23:51:54Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala

+    assert(ret.last.version == endVersion,
+      s"Expected last lineage version to be $endVersion, but got ${ret.last.version}")
+    // Assert that the lineage array is strictly increasing in version
+    assert(ret.sliding(2).forall {


Maybe move this to an error class as well ?

Made these also use INVALID_CHECKPOINT_LINEAGE.

anishshri-db · 2025-08-28T23:52:44Z

...core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreChangelog.scala

    Serialization.read[Array[LineageItem]](lineageStr)
  }

+  // The array contains lineage information from [snapShotVersion, version]


Both left and right inclusive right ?

I made a mistake it is actually [snapShotVersion, version). The version that was used to get this array is not included. I updated the comment to make this more clear.

liviazhu · 2025-08-29T22:16:25Z

common/utils/src/main/resources/error/error-conditions.json

  },
+  "STDS_MIXED_CHECKPOINT_FORMAT_VERSIONS_NOT_SUPPORTED" : {
+    "message" : [
+      "Reading state across different checkpoint format versions is not supported. startBatchId=<startBatchId>, endBatchId=<endBatchId>."


Can we also add the different checkpoint format versions they used here? I know there are only 2 now but we will add more in the future.

Change the error message to:

"message" : [ "Reading state across different checkpoint format versions is not supported.", "startBatchId=<startBatchId>, endBatchId=<endBatchId>.", "startFormatVersion=<startFormatVersion>, endFormatVersion=<endFormatVersion>." ],

liviazhu · 2025-08-29T22:41:38Z

...main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala

    StateStoreChangeDataReader = {
+
+    if (endVersionStateStoreCkptId.isDefined) {
+      throw QueryExecutionErrors.cannotLoadStore(new SparkException(


Can we make a new error condition for this (and change the other place where we do this)?

Added error class STATE_STORE_CHECKPOINT_IDS_NOT_SUPPORTED and used it here and the other places.

liviazhu · 2025-08-29T22:42:51Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala

+   * Construct the full lineage from startVersion to endVersion (inclusive) by
+   * walking backwards using lineage information embedded in changelog files.
+   */
+  def getFullLineage(


Can we add some unit tests for this new function? The logic seems quite complicated, want to make sure we can test all edge cases. Particularly the error cases.

Added RocksDBLineageSuite.scala that covers the main error cases.

liviazhu · 2025-08-29T23:01:21Z

...rc/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala

 /** [[StateStoreChangeDataReader]] implementation for [[RocksDBStateStoreProvider]] */
 class RocksDBStateStoreChangeDataReader(
    fm: CheckpointFileManager,
+    rocksDB: RocksDB,


Hm, seems a little strange to me that we are passing in RocksDB here in its entirety just so we can use getFullLineage. Is there a way to abstract out the getFullLineage functionality so we can reuse it a different way?

I initially had done some refactoring to refactor all the lineage related methods to RocksDBFileManager and only pass that in here. I did not do it in this PR just to reduce the amount of changes in this PR.

At a glance, all the lineage related methods (getChangelogReader, getLineageFromChangelogFile) exist in either RocksDB or RocksDBFilemanager. We should be able to abstract these methods out into something like ChangelogFileManager.scala since changelog lineage stuff is not directly dependent on RocksDB related methods.

I am not sure if we want to do this refactoring in the PR.

Sure that's fine. We don't have to do it in this PR

anishshri-db · 2025-09-02T01:04:01Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala

+      throw QueryExecutionErrors.invalidCheckpointLineage(printLineageItems(ret),
+        s"Lineage does not end with endVersion: $endVersion.")
+    }
+    val increasingByOne = ret.sliding(2).forall {


Can we add some more comments for what this block is doing ?

anishshri-db

+1 - pending green CI

liviazhu

LGTM - just one comment RE testing

liviazhu · 2025-09-02T17:41:42Z

...core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBLineageSuite.scala

+    }
+  }
+
+  test("getFullLineage: multi-hop across changelog files") {


Can we also add a test case where there are multiple files (with different lineages) for a single version?

Added "getFullLineage: multiple lineages exist for the same version"

liviazhu · 2025-09-02T18:04:20Z

...rc/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala

 /** [[StateStoreChangeDataReader]] implementation for [[RocksDBStateStoreProvider]] */
 class RocksDBStateStoreChangeDataReader(
    fm: CheckpointFileManager,
+    rocksDB: RocksDB,


Sure that's fine. We don't have to do it in this PR

…ly snapshotStartBatchId option) ### What changes were proposed in this pull request? This PR enables StateDataSource support with state checkpoint v2 format for the `snapshotStartBatchId` and related options, completing the StateDataSource checkpoint v2 integration. There is changes to the replayStateFromSnapshot method signature. `snapshotVersionStateStoreCkptId` and `endVersionStateStoreCkptId`. Both are needed as `snapshotVersionStateStoreCkptId` is used when getting the snapshot and `endVersionStateStoreCkptId` for calculating the full lineage from the final version. Before ``` def replayStateFromSnapshot( snapshotVersion: Long, endVersion: Long, readOnly: Boolean = false): StateStore ``` After ``` def replayStateFromSnapshot( snapshotVersion: Long, endVersion: Long, readOnly: Boolean = false): StateStore snapshotVersion: Long, endVersion: Long, readOnly: Boolean = false, snapshotVersionStateStoreCkptId: Option[String] = None, endVersionStateStoreCkptId: Option[String] = None): StateStore ``` This is the final PR in the series following: - #52047: Enable StateDataSource with state checkpoint v2 (only batchId option) - #52148: Enable StateDataSource with state checkpoint v2 (only readChangeFeed) NOTE: To read checkpoint v2 state data sources it is required to have `"spark.sql.streaming.stateStore.checkpointFormatVersion" -> 2`. It is possible to allow reading state data sources arbitrarily based on what is in the CommitLog by relaxing assertion checks but this is left as a future change. ### Why are the changes needed? State checkpoint v2 (`"spark.sql.streaming.stateStore.checkpointFormatVersion"`) introduces a new format for storing state metadata that includes unique identifiers in the file path for each state store. The existing StateDataSource implementation only worked with checkpoint v1 format, making it incompatible with streaming queries using the newer checkpoint format. Only `batchId` was implemented in #52047 and only `readChangeFeed` was implemented in #52148. ### Does this PR introduce _any_ user-facing change? Yes. State Data Source will work when checkpoint v2 is used and the `snapshotStartBatchId` and related options are used. ### How was this patch tested? In the previous PRs test suites were added to parameterize the current tests with checkpoint v2. All of these tests are now added back. All tests that previously intentionally tested some feature of the State Data Source Reader with checkpoint v1 should now be parameterized with checkpoint v2 (including python tests). `RocksDBWithCheckpointV2StateDataSourceReaderSnapshotSuite` is added which uses the golden file approach similar to #46944 where `snapshotStartBatchId` is first added. ### Was this patch authored or co-authored using generative AI tooling? No Closes #52202 from dylanwong250/SPARK-53332. Authored-by: Dylan Wong <[email protected]> Signed-off-by: Anish Shrigondekar <[email protected]>

Add read change feed support for checkpoint v2

c0aeeea

github-actions bot added SQL STRUCTURED STREAMING labels Aug 27, 2025

Dylan Wong added 2 commits August 27, 2025 22:14

Add TWS suite

f96a9d5

Add TWS init state suite

c270c63

github-actions bot added the PYTHON label Aug 27, 2025

Dylan Wong added 3 commits August 28, 2025 04:12

move methods back

2c342b1

fix formatting

839251c

minor comment fixes

7024ad8

anishshri-db reviewed Aug 28, 2025

View reviewed changes

Dylan Wong added 3 commits August 29, 2025 17:35

Add errors, add test

e8682cf

fix comment

1d7420e

remove useless assertion

4804b54

dylanwong250 requested a review from anishshri-db August 29, 2025 20:46

liviazhu reviewed Aug 29, 2025

View reviewed changes

Dylan Wong added 6 commits August 31, 2025 05:42

Improve and fix errors

0f603ab

Add test for mixed checkpoints

e5027bc

Add getFullLineage tests , fix test

bada482

Add python tests

41d9301

Reformat python

72000d5

fix todo comments

e3e4591

anishshri-db reviewed Sep 2, 2025

View reviewed changes

anishshri-db approved these changes Sep 2, 2025

View reviewed changes

Add comment

8357be5

fix line length

dc8ead3

liviazhu approved these changes Sep 2, 2025

View reviewed changes

add test

c67f06c

anishshri-db closed this in 885bfc2 Sep 2, 2025

dylanwong250 mentioned this pull request Sep 3, 2025

[SPARK-53332][SS] Enable StateDataSource with state checkpoint v2 (only snapshotStartBatchId option) #52202

Closed

[SPARK-53333][SS] Enable StateDataSource with state checkpoint v2 (only readChangeFeed) #52148

[SPARK-53333][SS] Enable StateDataSource with state checkpoint v2 (only readChangeFeed) #52148

Uh oh!

Conversation

dylanwong250 commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anishshri-db left a comment

Choose a reason for hiding this comment

Uh oh!

liviazhu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dylanwong250 commented Aug 27, 2025 •

edited

Loading