-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HDFS-17818. Fix serial fsimage transfer during checkpoint with multiple namenodes #7862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@Hexiaoqiao @ayushtkn @tomscut Do you think that uploading fsimage in checkpoint with observer namenode should be changed from serial to parallel? |
💔 -1 overall
This message was automatically generated. |
79408b5
to
1a62b21
Compare
💔 -1 overall
This message was automatically generated. |
5d88e72
to
bd9a615
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
bd9a615
to
10ae4be
Compare
💔 -1 overall
This message was automatically generated. |
10ae4be
to
d577987
Compare
🎊 +1 overall
This message was automatically generated. |
2c6c308
to
8258d01
Compare
💔 -1 overall
This message was automatically generated. |
afadc8e
to
abb5732
Compare
abb5732
to
f766baa
Compare
💔 -1 overall
This message was automatically generated. |
In our cluster, each namespace has four NameNodes: one active, one standby, and two observers. When the standby NameNode performs a checkpoint, it transfer the fsimage to the other three NameNodes. However, we found that these transfer are performed serially.
The reason is that the corePoolSize in ThreadPoolExecutor is 0, and the transfer task does not fill the LinkedBlockingQueue, resulting in only one thread transfer the fsimage at a time. This greatly increases the checkpoint time.
ExecutorService executor = new ThreadPoolExecutor(0, activeNNAddresses.size(), 100, TimeUnit.MILLISECONDS, new LinkedBlockingQueue<Runnable>(activeNNAddresses.size()), uploadThreadFactory);