-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-36719][CORE] Supporting Netty Logging at the network layer #33962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Test build #143151 has finished for PR 33962 at commit
|
common/network-common/src/main/java/org/apache/spark/network/TransportContext.java
Show resolved
Hide resolved
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM from my side. Thank you for adding this, @attilapiros .
common/network-common/src/main/java/org/apache/spark/network/TransportContext.java
Show resolved
Hide resolved
|
+CC @otterc |
mridulm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @attilapiros, this is very useful.
|
|
||
| protected String format(ChannelHandlerContext ctx, String eventName, Object arg) { | ||
| if (arg instanceof ByteBuf) { | ||
| return format(ctx, eventName) + " " + ((ByteBuf) arg).readableBytes() + "B"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: Add a space between value and "B" (here and below) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (no space) is consistent with Netty's own logging.
You can see it in the TRACE output in the PR description.
But I copy it for you (the last line is the proof):
+-------------------------------------------------+
| 0 1 2 3 4 5 6 7 8 9 a b c d e f |
+--------+-------------------------------------------------+----------------+
|00000000| 00 00 00 00 00 00 00 42 06 00 00 00 35 2f 6a 61 |.......B....5/ja|
|00000010| 72 73 2f 6f 72 69 67 69 6e 61 6c 2d 73 70 61 72 |rs/original-spar|
|00000020| 6b 2d 65 78 61 6d 70 6c 65 73 5f 32 2e 31 32 2d |k-examples_2.12-|
|00000030| 33 2e 33 2e 30 2d 53 4e 41 50 53 48 4f 54 2e 6a |3.3.0-SNAPSHOT.j|
|00000040| 61 72 |ar |
+--------+-------------------------------------------------+----------------+
21/09/10 15:29:14 TRACE NettyLogger: [id: 0xf1d25786, L:/172.30.64.219:61045 - R:/172.30.64.219:61044] FLUSH
21/09/10 15:29:14 TRACE NettyLogger: [id: 0x362fc693, L:/172.30.64.219:61044 - R:/172.30.64.219:61045] READ: 66B
common/network-common/src/main/java/org/apache/spark/network/util/NettyLogger.java
Outdated
Show resolved
Hide resolved
common/network-common/src/main/java/org/apache/spark/network/TransportContext.java
Outdated
Show resolved
Hide resolved
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks useful. Thanks for this work!
Is there any document we should add this log4j.logger.org.apache.spark.network.util.NettyLogger?
|
@viirya Thanks for the review!
So far I haven't seen any specific logger which is documented. Probably because loggers are more for the developers who are doing the debugging and not for the users. But if you can point me to a document sure I will extend it. |
|
I have done a quick search and no markdown contains any relevant mention of DEBUG/TRACE: |
Hm, I only see a logging related section "Configuring Logging" in "Spark Configuration" https://spark.apache.org/docs/latest/configuration.html#configuring-logging. Okay to skip it. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #143162 has finished for PR 33962 at commit
|
|
retest this please |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #143169 has finished for PR 33962 at commit
|
|
Thank you, @attilapiros , @mridulm , @viirya . |
### What changes were proposed in this pull request? Support Netty level logging at the network layer for Celeborn. To configure Netty level logging a LogHandler must be added to the channel pipeline. `NettyLogger` is introduced as a new class which is able to construct a log handler depending on the log level: - In case of `<Logger name="org.apache.celeborn.common.network.util.NettyLogger" level="DEBUG" additivity="false">`: a custom log handler is created which does not dump the message contents. This way the log is a bit more compact. Moreover when network level encryption is switched on this level might be sufficient. - In case of `<Logger name="org.apache.celeborn.common.network.util.NettyLogger" level="TRACE" additivity="false">`: Netty's own log handler is used which dumps the message contents. - Otherwise (when the logger is not `TRACE` or `DEBUG`) the pipeline does not contain a log handler (there is no runtime penalty for the default setting but a long running service must be restarted along with the new log level to have an effect). Backport: - [[SPARK-36719][CORE] Supporting Netty Logging at the network layer](apache/spark#33962) - [[SPARK-45377][CORE] Handle InputStream in NettyLogger](apache/spark#43165) ### Why are the changes needed? This level of logging proved to be sufficient during debugging some external shuffle related problem. Compared with the tcpdump this log lines can be more easily correlated with the Celeborn internal calls. Moreover the log layout can be configured to contain the thread names that way for a timeout a busy thread could be identified. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Local manually test. Closes #2423 from SteNicholas/CELEBORN-1359. Authored-by: SteNicholas <[email protected]> Signed-off-by: Shuang <[email protected]>
### What changes were proposed in this pull request? Support Netty level logging at the network layer for Celeborn. To configure Netty level logging a LogHandler must be added to the channel pipeline. `NettyLogger` is introduced as a new class which is able to construct a log handler depending on the log level: - In case of `<Logger name="org.apache.celeborn.common.network.util.NettyLogger" level="DEBUG" additivity="false">`: a custom log handler is created which does not dump the message contents. This way the log is a bit more compact. Moreover when network level encryption is switched on this level might be sufficient. - In case of `<Logger name="org.apache.celeborn.common.network.util.NettyLogger" level="TRACE" additivity="false">`: Netty's own log handler is used which dumps the message contents. - Otherwise (when the logger is not `TRACE` or `DEBUG`) the pipeline does not contain a log handler (there is no runtime penalty for the default setting but a long running service must be restarted along with the new log level to have an effect). Backport: - [[SPARK-36719][CORE] Supporting Netty Logging at the network layer](apache/spark#33962) - [[SPARK-45377][CORE] Handle InputStream in NettyLogger](apache/spark#43165) ### Why are the changes needed? This level of logging proved to be sufficient during debugging some external shuffle related problem. Compared with the tcpdump this log lines can be more easily correlated with the Celeborn internal calls. Moreover the log layout can be configured to contain the thread names that way for a timeout a busy thread could be identified. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Local manually test. Closes apache#2423 from SteNicholas/CELEBORN-1359. Authored-by: SteNicholas <[email protected]> Signed-off-by: Shuang <[email protected]>
What changes were proposed in this pull request?
Supporting Netty level logging at the network layer.
To configure Netty level logging a
LogHandlermust be added to the channel pipeline.In this PR I have introduced a new class
NettyLoggerwhich is able to construct a log handler depending on the log level:log4j.logger.org.apache.spark.network.util.NettyLogger=DEBUG: a custom log handler is created which does not dump the message contents. This way the log is a bit more compact. Moreover when network level encryption is switched on this level might be sufficient.log4j.logger.org.apache.spark.network.util.NettyLogger=TRACE: Netty's own log handler is used which dumps the message contents.Why are the changes needed?
This level of logging proved to be sufficient during debugging some external shuffle related problem.
Compared with the tcpdump this log lines can be more easily correlated with the Spark internal calls.
Moreover the log layout can be configured to contain the thread names that way for a timeout a busy thread could be identified.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Manually.
DEBUG level
TRACE level