Skip to content

Conversation

@LiebingYu
Copy link
Contributor

Purpose

Linked issue: close #1716

Brief change log

Tests

API and Format

Documentation

@LiebingYu LiebingYu force-pushed the fix-corrupt-index-new branch 2 times, most recently from b9b5f2d to ceaf704 Compare September 24, 2025 07:50
@LiebingYu
Copy link
Contributor Author

Ready for CR @wuchong @swuferhong

Copy link
Contributor

@swuferhong swuferhong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiebingYu Thanks for your work. I left some comments.

@LiebingYu LiebingYu force-pushed the fix-corrupt-index-new branch from ceaf704 to 1e8a582 Compare September 25, 2025 14:06
// can potentially skip over more segments.
LogTablet.rebuildWriterState(
writerStateManager, logSegments, 0, segment.getBaseOffset(), false);
int bytesTruncated = segment.recover();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to pass writerStateManager into segment.recover() to update writer state like how kafka does? cc @swuferhong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kafka use writerStateManager in segment.recover() mainly for txn operations. I think Fluss doen't need here. What's your opinion @swuferhong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing offline with @LiebingYu , we confirmed that it‘s necessary to rebuild the writerState1. Although it is not used during recovery, this code path ensures that when writerStateManager.takeSnapshot()` is called, a correct snapshot up to the current segment is written to disk. This way, during recovery, each segment can be restored based on a fully accurate writerState snapshot, without requiring a full global reconstruction.

segment.getFileLogRecords().file().getAbsoluteFile());
}
recoverSegment(segment);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we catch CorruptIndexException and recover segment for this case like how kafka does? And we should rethrow other exceptions in else branch.

Copy link
Contributor Author

@LiebingYu LiebingYu Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Kafka throw CorruptIndexException because Kafka will do sanity check for transaction file which may throw CorruptIndexException. But here in fluss we don't do sanity check of index files in segment.sanityCheck. So I think we don't need to catch CorruptIndexException here.

@wuchong
Copy link
Member

wuchong commented Nov 4, 2025

I added a commit to fix the IOException signature problem.

@LiebingYu LiebingYu requested a review from wuchong November 5, 2025 05:38
@wuchong wuchong merged commit d5cb521 into apache:main Nov 10, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ReplicaFetcherThread keeps throwing UnknownServerException because of corrupt index file

3 participants