Skip to content

Conversation

@gottagogottagoGxj
Copy link

…onServer crash

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 24s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 3m 54s master passed
+1 💚 compile 2m 34s master passed
+1 💚 checkstyle 0m 36s master passed
+1 💚 spotless 0m 43s branch has no errors when running spotless:check.
+1 💚 spotbugs 1m 30s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 35s the patch passed
+1 💚 compile 2m 31s the patch passed
+1 💚 javac 2m 31s the patch passed
-0 ⚠️ checkstyle 0m 34s hbase-server: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 13m 22s Patch does not cause any errors with Hadoop 3.2.4 3.3.4.
-1 ❌ spotless 0m 36s patch has 53 errors when running spotless:check, run spotless:apply to fix.
-1 ❌ spotbugs 1m 41s hbase-server generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
_ Other Tests _
+1 💚 asflicense 0m 10s The patch does not generate ASF License warnings.
40m 13s
Reason Tests
FindBugs module:hbase-server
Sequence of calls to java.util.concurrent.ConcurrentHashMap may not be atomic in org.apache.hadoop.hbase.replication.regionserver.RecoveredReplicationSource.startShipperWorks() At RecoveredReplicationSource.java:may not be atomic in org.apache.hadoop.hbase.replication.regionserver.RecoveredReplicationSource.startShipperWorks() At RecoveredReplicationSource.java:[line 180]
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5177/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #5177
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux d9c601b0220d 5.4.0-1093-aws #102~18.04.2-Ubuntu SMP Wed Dec 7 00:31:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / a711059
Default Java Eclipse Adoptium-11.0.17+8
checkstyle https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5177/1/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt
spotless https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5177/1/artifact/yetus-general-check/output/patch-spotless.txt
spotbugs https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5177/1/artifact/yetus-general-check/output/new-spotbugs-hbase-server.html
Max. process+thread count 82 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5177/1/console
versions git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache9
Copy link
Contributor

Apache9 commented Apr 13, 2023

Mind explaining more on how do we fix the no node exception?

@thangTang
Copy link
Contributor

thangTang commented Mar 21, 2024

Hi @gottagogottagoGxj, appreciate if you could give some more explain about this ticket and your HBase version.

Seems I met this issue too, on HBase 2.4.11

Here is my log:

2024-03-21 16:19:43,379 WARN  [ReplicationExecutor-0.replicationSource,xxxxx,1705567104078.replicationSource.shipper000.000.000.000%2C16020%2C1705567104078.000.000.000.000%2C16020%2C1705567104078.regiongroup-1,xxxxx,1705567104078] regionserver.ReplicationSourceShipper: com.shopee.di.foundation.hbase.KafkaInterClusterReplicationEndpoint threw unknown exception:
java.util.ConcurrentModificationException
        at java.base/java.util.HashMap.computeIfAbsent(HashMap.java:1221)
        at org.apache.hadoop.hbase.replication.regionserver.MetricsSource.updateTableLevelMetrics(MetricsSource.java:112)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.shipEdits(ReplicationSourceShipper.java:215)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.run(ReplicationSourceShipper.java:117)
2024-03-21 16:19:43,405 ERROR [ReplicationExecutor-0.replicationSource,xxxxx,1705567104078.replicationSource.shipper000.000.000.000%2C16020%2C1705567104078.000.000.000.000%2C16020%2C1705567104078.regiongroup-1,xxxxx,1705567104078] regionserver.HRegionServer: ***** ABORTING region server ip-10-80-163-145.idata-server.shopee.io,16020,1704705566934: Failed to operate on replication queue *****
org.apache.hadoop.hbase.replication.ReplicationException: Failed to set log position (serverName=xxxxx,1704705566934, queueId=xxxxx,1705567104078, fileName=000.000.000.000%2C16020%2C1705567104078.000.000.000.000%2C16020%2C1705567104078.regiongroup-1.1711008927746, position=130724689)
        at org.apache.hadoop.hbase.replication.ZKReplicationQueueStorage.setWALPosition(ZKReplicationQueueStorage.java:255)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.lambda$logPositionAndCleanOldLogs$8(ReplicationSourceManager.java:552)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.interruptOrAbortWhenFail(ReplicationSourceManager.java:500)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:551)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceInterface.logPositionAndCleanOldLogs(ReplicationSourceInterface.java:206)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.updateLogPosition(ReplicationSourceShipper.java:264)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.shipEdits(ReplicationSourceShipper.java:203)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.run(ReplicationSourceShipper.java:117)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
        at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1925)
        at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1830)
        at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:658)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1534)
        at org.apache.hadoop.hbase.replication.ZKReplicationQueueStorage.setWALPosition(ZKReplicationQueueStorage.java:245)
        ... 7 more

*Desensitized information such as servername and IP.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants