Skip to content

Conversation

svix-jbrown
Copy link

@svix-jbrown svix-jbrown commented Oct 8, 2025

Summary

This adds support for tracing spans in clickhouse-rs. We use them for opentelemetry, but they're just generally useful. We've been running this (well, a version of this against 0.13.3) for a while with good results.

I haven't written any tests yet, but I wanted to get a first pass on whether this sounds like something upstream would be interested in accepting before I did so.

Checklist

Delete items not relevant to your PR:

  • Unit and integration tests covering the common scenarios were added
  • A human-readable description of the changes was provided so that we can include it in CHANGELOG later
  • For significant changes, documentation in README and https://github.com/ClickHouse/clickhouse-docs was updated with further explanations or tutorials

@svix-jbrown svix-jbrown marked this pull request as ready for review October 8, 2025 23:14
bstr = { version = "1.11.0", default-features = false }
quanta = { version = "0.12", optional = true }
replace_with = { version = "0.1.7" }
tracing = "0.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least, this should go under a feature flag.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that, but didn't in this PR since it adds a lot of noise to #[cfg] out all the functionality

@slvrtrn slvrtrn requested a review from abonander October 9, 2025 19:10
Copy link
Contributor

@slvrtrn slvrtrn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current state, I do not think that it is mergeable, as this requires thorough testing from our side, given the summary header gotchas, possible performance impact, etc.

We have plans to explore a proper tracing implementation in the client soon, but at this point, we have a few other areas to focus on.

Self {
client: client.clone(),
sql: SqlBuilder::new(template),
wait_end_of_query: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be set via Query::with_option("wait_end_of_query", "1"), and there is no need to add an explicit constructor for that. It simply does not scale for every case and uses different terminology.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, it's not really relevant to the purpose of this PR. Was this an API you were experimenting with and didn't intend to commit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without wait_end_of_query, the metrics that come out of the X-Clickhouse-Summary header are not meaningful

}

/// Starts a new SELECT/DDL query, with the `wait_end_of_query` setting enabled
/// to buffer the full query results on the server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also basically denies the possibility of streaming the data as soon as the first block is processed, cause everything will be written as the temp files first, and it can actively hinder overall performance...

result_bytes = summary_header.result_bytes,
elapsed_ns = summary_header.elapsed_ns,
"finished processing query"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about relying on the summary header, as it will be off in quite a few situations. To get it right, you will have to use wait_end_of_query indeed, but it is too big a trade-off IMO.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obviously it depends on the use-case; many of our queries are aggregates that scan a lot of rows but return relatively few, so it's fine to turn of for those

pub(crate) struct Chunks {
stream: Option<Box<DetectDbException<Decompress<IncomingStream>>>>,
span: Option<Span>,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what the impact on performance is here, cause the overall width of the structure on the hottest path in the client increases significantly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Span is not exactly tiny, either. I estimate it's at least 4 words wide: https://docs.rs/tracing/latest/src/tracing/span.rs.html#348

That's 32 extra bytes added to this structure on 64-bit, not including the discriminator for Option<Span> which is going to add another 8 bytes since Span provides nothing internally for null pointer optimization.

In fact, Span itself has a no-op constructor that's recommended to be used in place of Option<Span> for this reason: https://docs.rs/tracing/latest/tracing/struct.Span.html#method.none

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants