-
Notifications
You must be signed in to change notification settings - Fork 134
feat: add tracing spans for queries #323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
bstr = { version = "1.11.0", default-features = false } | ||
quanta = { version = "0.12", optional = true } | ||
replace_with = { version = "0.1.7" } | ||
tracing = "0.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very least, this should go under a feature flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that, but didn't in this PR since it adds a lot of noise to #[cfg]
out all the functionality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current state, I do not think that it is mergeable, as this requires thorough testing from our side, given the summary header gotchas, possible performance impact, etc.
We have plans to explore a proper tracing implementation in the client soon, but at this point, we have a few other areas to focus on.
Self { | ||
client: client.clone(), | ||
sql: SqlBuilder::new(template), | ||
wait_end_of_query: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be set via Query::with_option("wait_end_of_query", "1")
, and there is no need to add an explicit constructor for that. It simply does not scale for every case and uses different terminology.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moreover, it's not really relevant to the purpose of this PR. Was this an API you were experimenting with and didn't intend to commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without wait_end_of_query
, the metrics that come out of the X-Clickhouse-Summary header are not meaningful
} | ||
|
||
/// Starts a new SELECT/DDL query, with the `wait_end_of_query` setting enabled | ||
/// to buffer the full query results on the server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also basically denies the possibility of streaming the data as soon as the first block is processed, cause everything will be written as the temp files first, and it can actively hinder overall performance...
result_bytes = summary_header.result_bytes, | ||
elapsed_ns = summary_header.elapsed_ns, | ||
"finished processing query" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about relying on the summary header, as it will be off in quite a few situations. To get it right, you will have to use wait_end_of_query
indeed, but it is too big a trade-off IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obviously it depends on the use-case; many of our queries are aggregates that scan a lot of rows but return relatively few, so it's fine to turn of for those
pub(crate) struct Chunks { | ||
stream: Option<Box<DetectDbException<Decompress<IncomingStream>>>>, | ||
span: Option<Span>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what the impact on performance is here, cause the overall width of the structure on the hottest path in the client increases significantly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Span
is not exactly tiny, either. I estimate it's at least 4 words wide: https://docs.rs/tracing/latest/src/tracing/span.rs.html#348
Id
is a wrapper aroundNonZeroU64
, 1 word on 64-bit or 2 words on 32-bit: https://docs.rs/tracing-core/latest/src/tracing_core/span.rs.html#16Dispatch
is a wrapper around theKind
enum, 2 words: https://docs.rs/tracing-core/latest/src/tracing_core/dispatcher.rs.html#152- And then
&'static Metadata
is of course also 1 word.
That's 32 extra bytes added to this structure on 64-bit, not including the discriminator for Option<Span>
which is going to add another 8 bytes since Span
provides nothing internally for null pointer optimization.
In fact, Span
itself has a no-op constructor that's recommended to be used in place of Option<Span>
for this reason: https://docs.rs/tracing/latest/tracing/struct.Span.html#method.none
Summary
This adds support for
tracing
spans in clickhouse-rs. We use them for opentelemetry, but they're just generally useful. We've been running this (well, a version of this against 0.13.3) for a while with good results.I haven't written any tests yet, but I wanted to get a first pass on whether this sounds like something upstream would be interested in accepting before I did so.
Checklist
Delete items not relevant to your PR: