Skip to content

Conversation

@DieterDP-ng
Copy link
Contributor

Note that this build on top of my older PR: #6134

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@ndimiduk
Copy link
Member

ndimiduk commented Jan 8, 2025

Do you mind fixing this broken checkstyle while you're in there?

[WARNING] The requested profile "test-patch" could not be activated because it does not exist.
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-checkstyle-plugin:3.1.0:checkstyle (default-cli) on project hbase-backup: An error has occurred in Checkstyle report generation. Failed during checkstyle configuration: Exception was thrown while processing /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-6506/yetus-general-check/src/hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BulkLoad.java: IllegalStateException occurred while parsing file /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-6506/yetus-general-check/src/hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BulkLoad.java. /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-6506/yetus-general-check/src/hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/BulkLoad.java:72:33: expecting RPAREN, found 'that' -> [Help 1]

@DieterDP-ng DieterDP-ng force-pushed the HBASE-29003 branch 2 times, most recently from 15fdfe7 to 01b6b87 Compare February 7, 2025 12:40
@DieterDP-ng
Copy link
Contributor Author

Fixed the checkstyle error.

@ndimiduk @rmdmattingly the discussion on Jira about the best approach stopped. Should I wait for further feedback?

Personally, I think the best solution would be to track in the backup system table whether a full backup is required or not. This tracking would have to occur through an Observer as is done in this ticket. Given that this PR solves a real issue, I'd suggest to include it, and leave the switch to the tracking system to a later commit.

Note that users should update their config to include the new observer. Not sure if this counts as a migration since the backup is still considered experimental. Should be mentioned somewhere though, but unsure where.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@ndimiduk
Copy link
Member

FYI @hgromer

@ndimiduk
Copy link
Member

Note that users should update their config to include the new observer. Not sure if this counts as a migration since the backup is still considered experimental. Should be mentioned somewhere though, but unsure where.

This kind of change does demand a mention in the release notes. We have such a field on our Jira ticket schema, and it automatically gets populated at release time.

@hgromer
Copy link
Contributor

hgromer commented Feb 26, 2025

I had actually spun up a Jira to this effect, didn't realize there was already an open PR

@DieterDP-ng
Copy link
Contributor Author

Any advice on how to progress this ticket?

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@DieterDP-ng
Copy link
Contributor Author

Test failure looks unrelated to this PR, flaky test?

@rmdmattingly
Copy link
Contributor

Hey @DieterDP-ng, thanks for the work here and sorry that I've been slow to respond. I will take a look here today

Copy link
Contributor

@rmdmattingly rmdmattingly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. @ndimiduk or @hgromer, any thoughts?

Predicate<BulkLoad> filter) throws IOException {
try (Connection connection = ConnectionFactory.createConnection(config);
BackupSystemTable tbl = new BackupSystemTable(connection)) {
List<BulkLoad> bulkLoads = tbl.readBulkloadRows(List.of(tableName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a concern for another time, but it's only a matter of time until the lack of pagination in BackupSystemTable#readBulkloadRows bites us

backupManager.writeBackupStartCode(newStartCode);

backupManager
.deleteBulkLoadedRows(bulkLoadsToDelete.stream().map(BulkLoad::getRowKey).toList());
Copy link
Contributor

@hgromer hgromer Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At my company, we have a similar patch applied to our fork, and we've run into issues with batch sizes that causes backup failures. This seems to happen when there are too many rows to delete, you end up with something like

Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: Rejecting large batch operation for current batch with firstRegionName: backup:system_bulk,,1739970553683.c3828af81a4b3847aa0f1612bf638713. , Requested Number of Rows: 2048 , Size Threshold: 1500
	at org.apache.hadoop.hbase.regionserver.RSRpcServices.checkBatchSizeAndLogLargeSize(RSRpcServices.java:2721)
	at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2757)
	at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:43520)
	at org.apache.ha

It might be worth splitting up this call to delete if they are exceptionally large

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hgromer do you feel that splitting up the delete is required for this patch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DieterDP-ng any concerns about making this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth splitting this up. Introducing this means we can get into a situation where backups are unable to succeed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objection to splitting it. Updating the PR...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than doing batching here, I did it at BackupSystemTable.deleteBulkLoadedRows, since that method is used in multiple locations, and I guess they would all benefit from batching. See added commit (can be squashed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(For future reference: turned out batching was already built-in, see #6506 (comment))

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

public void deleteBulkLoadedRows(List<byte[]> rows) throws IOException {
try (BufferedMutator bufferedMutator = connection.getBufferedMutator(bulkLoadTableName)) {
try (BufferedMutator bufferedMutator = connection
.getBufferedMutator(new BufferedMutatorParams(bulkLoadTableName).setMaxMutations(1000))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to set this to the config value HConstants.BATCH_ROWS_THRESHOLD_NAME, seeing as that's the cluster limit for batches?

The configuration value must then be supplied by the operator launching the backup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pointer led me to conclude that it's not needed to specify batching at all, because it's already present.

Our method uses Connection#getBufferedMutator, for which the only non-delegating implementation is ConnectionOverAsyncConnection#getBufferedMutator. That one calls AsyncConnectionImpl#getBufferedMutatorBuilder(TableName). In that method we can see it uses a AsyncConnectionConfiguration object that has already extracted a batching value:

AsyncConnectionConfiguration(Configuration conf) {
  ...
  this.bufferedMutatorMaxMutations = conf.getInt(BUFFERED_MUTATOR_MAX_MUTATIONS_KEY,
      conf.getInt(HConstants.BATCH_ROWS_THRESHOLD_NAME, BUFFERED_MUTATOR_MAX_MUTATIONS_DEFAULT));
}

So I'll remove my change that introduced the batchsize.

Copy link
Contributor

@hgromer hgromer Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good to know, so the only thing needed is for the operator to specify the configuration in the Connection object that is passed to the BackupAdminImp

👍

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@rmdmattingly
Copy link
Contributor

Retrying the build

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@DieterDP-ng
Copy link
Contributor Author

The failed test runs OK for me. I guess another flaky test.

}
(coproc == null ? "" : coproc + ",") + observerClass);

String masterCoProc = conf.get(CoprocessorHost.MASTER_COPROCESSOR_CONF_KEY);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably belongs in the decorateMasterConfiguration method instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to make this change? If not, I'm happy to ship this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

I also extracted all master-related stuff from the BackupObserver to BackupMasterObserver. Did this in the light of HBASE-29240, where there's more master-specific logic being added.

The HBase backup mechanism keeps track of which HFiles
were bulk loaded, so they can be included in incremental
backups.

Before this ticket, these bulk load records were only
deleted when an incremental backup is created. This
commit adds 2 more locations:

1) after a full backup. Since a full backup already
captures all data, this meant that unnecessary HFiles
were being included in the next incremental backup.

2) after a table delete/truncate/CF-deletion. Previously,
if an HFile was loaded before a table was cleared, the next
incremental backup would effectively still include the
HFile. This lead to incorrect data being restored.

This commit also completely refactors & simplifies the
test for this functionality.
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 28s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for branch
+1 💚 mvninstall 3m 10s master passed
+1 💚 compile 7m 53s master passed
+1 💚 checkstyle 1m 10s master passed
+1 💚 spotbugs 9m 17s master passed
+0 🆗 refguide 2m 18s branch has no errors when building the reference guide. See footer for rendered docs, which you should manually inspect.
+1 💚 spotless 0m 44s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 11s Maven dependency ordering for patch
+1 💚 mvninstall 3m 1s the patch passed
+1 💚 compile 7m 45s the patch passed
+1 💚 javac 7m 45s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 15s the patch passed
+1 💚 spotbugs 9m 41s the patch passed
+0 🆗 refguide 2m 4s patch has no errors when building the reference guide. See footer for rendered docs, which you should manually inspect.
+1 💚 hadoopcheck 11m 50s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 47s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 29s The patch does not generate ASF License warnings.
70m 5s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6506/10/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6506
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless refguide
uname Linux 7f13fbdef710 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 9cf3008
Default Java Eclipse Adoptium-17.0.11+9
refguide https://nightlies.apache.org/hbase/HBase-PreCommit-GitHub-PR/PR-6506/10/yetus-general-check/output/branch-site/book.html
refguide https://nightlies.apache.org/hbase/HBase-PreCommit-GitHub-PR/PR-6506/10/yetus-general-check/output/patch-site/book.html
Max. process+thread count 187 (vs. ulimit of 30000)
modules C: hbase-server hbase-backup . U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6506/10/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 27s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 36s Maven dependency ordering for branch
+1 💚 mvninstall 3m 25s master passed
+1 💚 compile 2m 7s master passed
+1 💚 javadoc 2m 44s master passed
+1 💚 shadedjars 5m 57s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for patch
+1 💚 mvninstall 3m 1s the patch passed
+1 💚 compile 2m 11s the patch passed
+1 💚 javac 2m 11s the patch passed
+1 💚 javadoc 2m 34s the patch passed
+1 💚 shadedjars 5m 58s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 317m 34s /patch-unit-root.txt root in the patch failed.
353m 33s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6506/10/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6506
Optional Tests javac javadoc unit compile shadedjars
uname Linux d5a0bbc52f6e 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 9cf3008
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6506/10/testReport/
Max. process+thread count 5112 (vs. ulimit of 30000)
modules C: hbase-server hbase-backup . U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6506/10/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@rmdmattingly rmdmattingly merged commit 65a6d8a into apache:master Apr 21, 2025
1 check failed
rmdmattingly pushed a commit that referenced this pull request Apr 21, 2025
Signed-off-by: Ray Mattingly <[email protected]>

The HBase backup mechanism keeps track of which HFiles
were bulk loaded, so they can be included in incremental
backups.

Before this ticket, these bulk load records were only
deleted when an incremental backup is created. This
commit adds 2 more locations:

1) after a full backup. Since a full backup already
captures all data, this meant that unnecessary HFiles
were being included in the next incremental backup.

2) after a table delete/truncate/CF-deletion. Previously,
if an HFile was loaded before a table was cleared, the next
incremental backup would effectively still include the
HFile. This lead to incorrect data being restored.

This commit also completely refactors & simplifies the
test for this functionality.
rmdmattingly pushed a commit that referenced this pull request Apr 21, 2025
Signed-off-by: Ray Mattingly <[email protected]>

The HBase backup mechanism keeps track of which HFiles
were bulk loaded, so they can be included in incremental
backups.

Before this ticket, these bulk load records were only
deleted when an incremental backup is created. This
commit adds 2 more locations:

1) after a full backup. Since a full backup already
captures all data, this meant that unnecessary HFiles
were being included in the next incremental backup.

2) after a table delete/truncate/CF-deletion. Previously,
if an HFile was loaded before a table was cleared, the next
incremental backup would effectively still include the
HFile. This lead to incorrect data being restored.

This commit also completely refactors & simplifies the
test for this functionality.
rmdmattingly pushed a commit that referenced this pull request Apr 22, 2025
Signed-off-by: Ray Mattingly <[email protected]>

The HBase backup mechanism keeps track of which HFiles
were bulk loaded, so they can be included in incremental
backups.

Before this ticket, these bulk load records were only
deleted when an incremental backup is created. This
commit adds 2 more locations:

1) after a full backup. Since a full backup already
captures all data, this meant that unnecessary HFiles
were being included in the next incremental backup.

2) after a table delete/truncate/CF-deletion. Previously,
if an HFile was loaded before a table was cleared, the next
incremental backup would effectively still include the
HFile. This lead to incorrect data being restored.

This commit also completely refactors & simplifies the
test for this functionality.
rmdmattingly added a commit that referenced this pull request Apr 22, 2025
Signed-off-by: Ray Mattingly <[email protected]>

The HBase backup mechanism keeps track of which HFiles
were bulk loaded, so they can be included in incremental
backups.

Before this ticket, these bulk load records were only
deleted when an incremental backup is created. This
commit adds 2 more locations:

1) after a full backup. Since a full backup already
captures all data, this meant that unnecessary HFiles
were being included in the next incremental backup.

2) after a table delete/truncate/CF-deletion. Previously,
if an HFile was loaded before a table was cleared, the next
incremental backup would effectively still include the
HFile. This lead to incorrect data being restored.

This commit also completely refactors & simplifies the
test for this functionality.

Co-authored-by: DieterDP <[email protected]>
rmdmattingly pushed a commit that referenced this pull request Apr 23, 2025
Signed-off-by: Ray Mattingly <[email protected]>

The HBase backup mechanism keeps track of which HFiles
were bulk loaded, so they can be included in incremental
backups.

Before this ticket, these bulk load records were only
deleted when an incremental backup is created. This
commit adds 2 more locations:

1) after a full backup. Since a full backup already
captures all data, this meant that unnecessary HFiles
were being included in the next incremental backup.

2) after a table delete/truncate/CF-deletion. Previously,
if an HFile was loaded before a table was cleared, the next
incremental backup would effectively still include the
HFile. This lead to incorrect data being restored.

This commit also completely refactors & simplifies the
test for this functionality.
rmdmattingly added a commit that referenced this pull request Apr 23, 2025
Signed-off-by: Ray Mattingly <[email protected]>

The HBase backup mechanism keeps track of which HFiles
were bulk loaded, so they can be included in incremental
backups.

Before this ticket, these bulk load records were only
deleted when an incremental backup is created. This
commit adds 2 more locations:

1) after a full backup. Since a full backup already
captures all data, this meant that unnecessary HFiles
were being included in the next incremental backup.

2) after a table delete/truncate/CF-deletion. Previously,
if an HFile was loaded before a table was cleared, the next
incremental backup would effectively still include the
HFile. This lead to incorrect data being restored.

This commit also completely refactors & simplifies the
test for this functionality.

Co-authored-by: DieterDP <[email protected]>
rmdmattingly added a commit that referenced this pull request Apr 23, 2025
Signed-off-by: Ray Mattingly <[email protected]>

The HBase backup mechanism keeps track of which HFiles
were bulk loaded, so they can be included in incremental
backups.

Before this ticket, these bulk load records were only
deleted when an incremental backup is created. This
commit adds 2 more locations:

1) after a full backup. Since a full backup already
captures all data, this meant that unnecessary HFiles
were being included in the next incremental backup.

2) after a table delete/truncate/CF-deletion. Previously,
if an HFile was loaded before a table was cleared, the next
incremental backup would effectively still include the
HFile. This lead to incorrect data being restored.

This commit also completely refactors & simplifies the
test for this functionality.

Co-authored-by: DieterDP <[email protected]>
rmdmattingly added a commit that referenced this pull request Apr 25, 2025
Signed-off-by: Ray Mattingly <[email protected]>

The HBase backup mechanism keeps track of which HFiles
were bulk loaded, so they can be included in incremental
backups.

Before this ticket, these bulk load records were only
deleted when an incremental backup is created. This
commit adds 2 more locations:

1) after a full backup. Since a full backup already
captures all data, this meant that unnecessary HFiles
were being included in the next incremental backup.

2) after a table delete/truncate/CF-deletion. Previously,
if an HFile was loaded before a table was cleared, the next
incremental backup would effectively still include the
HFile. This lead to incorrect data being restored.

This commit also completely refactors & simplifies the
test for this functionality.

Co-authored-by: DieterDP <[email protected]>
rmdmattingly added a commit that referenced this pull request Apr 27, 2025
The HBase backup mechanism keeps track of which HFiles
were bulk loaded, so they can be included in incremental
backups.

Before this ticket, these bulk load records were only
deleted when an incremental backup is created. This
commit adds 2 more locations:

1) after a full backup. Since a full backup already
captures all data, this meant that unnecessary HFiles
were being included in the next incremental backup.

2) after a table delete/truncate/CF-deletion. Previously,
if an HFile was loaded before a table was cleared, the next
incremental backup would effectively still include the
HFile. This lead to incorrect data being restored.

This commit also completely refactors & simplifies the
test for this functionality.

Co-authored-by: DieterDP <[email protected]>
mokai87 pushed a commit to mokai87/hbase that referenced this pull request Aug 7, 2025
Signed-off-by: Ray Mattingly <[email protected]>

The HBase backup mechanism keeps track of which HFiles
were bulk loaded, so they can be included in incremental
backups.

Before this ticket, these bulk load records were only
deleted when an incremental backup is created. This
commit adds 2 more locations:

1) after a full backup. Since a full backup already
captures all data, this meant that unnecessary HFiles
were being included in the next incremental backup.

2) after a table delete/truncate/CF-deletion. Previously,
if an HFile was loaded before a table was cleared, the next
incremental backup would effectively still include the
HFile. This lead to incorrect data being restored.

This commit also completely refactors & simplifies the
test for this functionality.

Co-authored-by: DieterDP <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants