-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Physical rewrite #17565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Physical rewrite #17565
Conversation
@amotin, thank you for this! on a first pass it looks good to me. |
During regular block writes ZFS sets both logical and physical birth times equal to the current TXG. During dedup and block cloning logical birth time is still set to the current TXG, but physical may be copied from the original block that was used. This represents the fact that logically user data has changed, but the physically it is the same old block. But block rewrite introduces a new situation, when block is not changed logically, but stored in a different place of the pool. From ARC, scrub and some other perspectives this is a new block, but for example for user applications or incremental replication it is not. Somewhat similar thing happen during remap phase of device removal, but in that case space blocks are still acounted as allocated at their logical birth times. This patch introduces a new "rewrite" flag in the block pointer structure, allowing to differentiate physical rewrite (when the block is actually reallocated at the physical birth time) from the device reval case (when the logical birth time is used). The new functionality is not used at this point, and the only expected change is that error log is now kept in terms of physical physical birth times, rather than logical, since if a block with logged error was somehow rewritten, then the previous error does not matter any more. This change also introduces a new TRAVERSE_LOGICAL flag to the traverse code, allowing zfs send, redact and diff to work in context of logical birth times, ignoring physical-only rewrites. It also changes nothing at this point due to lack of those writes, but they will come in a following patch. Signed-off-by: Alexander Motin <[email protected]>
3b724aa
to
c9382ac
Compare
Just a rebase and conflict resolution. |
@@ -1235,11 +1235,9 @@ dbuf_verify(dmu_buf_impl_t *db) | |||
DVA_IS_EMPTY(&bp->blk_dva[1]) && | |||
DVA_IS_EMPTY(&bp->blk_dva[2])); | |||
ASSERT0(bp->blk_fill); | |||
ASSERT0(bp->blk_pad[0]); | |||
ASSERT0(bp->blk_pad[1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ASSERT0(bp->blk_pad[1]); | |
ASSERT0(bp->blk_pad); |
Might as well keep what we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure it is a good idea to verify padding fields. One day they might get used, causing assertions. I prefer "zero on write, ignore on read" approach.
tests/zfs-tests/tests/functional/cli_root/zfs_rewrite/zfs_rewrite_physical.ksh
Outdated
Show resolved
Hide resolved
Based on previous commit this implements `zfs rewrite -P` flag, making ZFS to keep blocks logical birth times while rewriting files. It should exclude the rewritten blocks from incremental sends, snapshot diffs, etc. Snapshots space usage same time will reflect the additional space usage from newly allocated blocks. Since this begins to use new "rewrite" flag in the block pointers, this commit introduces a new read-compatible per-dataset feature physical_rewrite. It must be enabled for the command to not fail, it is activated on first use and deactivated on deletion of the last affected dataset. Signed-off-by: Alexander Motin <[email protected]>
Motivation and Context
The earlier implemented
zfs rewrite
functionality for simplicity updated logical birth times of all rewritten blocks. It makes them look modified from perspective of replication, snapshot diffs, etc, even though the actual user data remain the same. While some people found it useful to recover corrupted remote backups, for majority replication of large extra amounts of logically unchanged blocks can be a huge waste of time and resources.Description
This PR implements a new variation of rewrite, called "physical rewrite", controlled by the new
-P
argument to thezfs rewrite
subcommand. When possible, it tries to keep logical birth times unchanged. It allows to distinguish blocks that were just relocated within a pool from blocks that were actually modified by users. While the first may occupy additional disk space due to snapshots, block cloning, etc, that should be accounted as such, they should be ignored by replication, etc.Previously we've had block pointers with physical birth times bigger than logical birth times only as result of device removal remap process. But in that case space usage accounting was still based on block's logical birth times. Since physical rewrites require space reallocation accounted based on the physical birth times, to differentiate those two cases this PR introduces new "R"/"rewrite" flag in the block pointer structure. When set, it means the block's space accounting should use physical birth time instead of traditional logical birth time. Since read-only pool imports do not really care about space accounting, the new per-dataset pool feature "physical_rewrite" gating this is declared as read-compatible. The feature will be activated on first use and deactivated when last of affected datasets is deleted.
There are two exceptions when logical birth time might still be modified around physical rewrite:
Now that we have different birth times in block pointers, traversal code got new
TRAVERSE_LOGICAL
flag, allowing to choose between traversing only logical changes (replication, diff, etc), or physical changes (scrub/resilver, dataset destroy, etc).How Has This Been Tested?
Several successful CI runs. Manual testing with
zfs rewrite
andzfs rewrite -P
vszfs send -i
.Types of changes
Checklist:
Signed-off-by
.