Skip to content

Conversation

vepadulano
Copy link
Member

The default Snapshot compression setting has always been 101. Historically, this was done for simplicity reasons and following the principle of least surprise. TTree was the only output format available with Snapshot, so the operation was defaulting to the same value used by TTree. Now that Snapshot supports more than one output format, this reason is less strong than before. It has been established that 505 is a better default compression setting overall, so RDataFrame should follow that. The main disadvantage is that this change is hard to communicate. This commit proposes to introduce an information message being shown once per program execution, and only if the program is calling Snapshot. This message is supposed to help the users detect changes in their output file size with the next development cycle (6.38.x) and should be removed afterwards.

As discussed during the ROOT I/O meeting

Copy link
Member

@pcanal pcanal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link

github-actions bot commented Oct 3, 2025

Test Results

    22 files      22 suites   3d 22h 59m 32s ⏱️
 3 683 tests  3 683 ✅ 0 💤 0 ❌
79 136 runs  79 136 ✅ 0 💤 0 ❌

Results for commit 7c5c04f.

♻️ This comment has been updated with latest results.

@vepadulano vepadulano force-pushed the rdf-change-snapshot-comp-settings branch from 81c2ac0 to 7c5c04f Compare October 3, 2025 23:08
@vepadulano
Copy link
Member Author

@pcanal I am now slightly in doubt about this current approach. For the information message to be actually visible, a user would have to explicitly activate the "info" logging in their application by writing

#include <ROOT/RLogger.hxx>

// this increases RDF's verbosity level as long as the `verbosity` variable is in scope
auto verbosity = ROOT::RLogScopedVerbosity(ROOT::Detail::RDF::RDFLogChannel(), ROOT::ELogLevel::kInfo);

Whereas I believe our goal here is to show the message once per application anytime the user is writing at least one Snapshot call in their program. By turning the "info" to a "warning", we would get that behaviour.

@silverweed
Copy link
Contributor

I think a Warning is fine, considering that you are warning the user about the change in behavior

@dpiparo
Copy link
Member

dpiparo commented Oct 6, 2025

Thanks for adding it to the RN, which do not reach all users, but at least make the change officially documented in writing.

Copy link
Contributor

@enirolf enirolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I agree with the suggestion to turn this into a warning. I added one suggestion to the RN to add some additional context, but it's also okay for me to leave them as-is.

@vepadulano vepadulano requested a review from hageboeck October 6, 2025 12:46
Copy link
Member

@hageboeck hageboeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favour of changing the default, and also think that it's good to notify users, but I think the way how the message gets out could be improved:

  • An entry in the release notes is warranted
  • The Info message should show as info, so we might have to move to TError.h's Info. (A warning would be too high in my opinion).
  • And I think the message should be silenceable using an entry in TEnv.

How about:

The default compression settings of Snapshot have been changed from 101 (ZLIB with compression level 1) to 505 (ZSTD with compression level 5).
See https://root.cern/
Silence this message adding RDF_SNAPSHOT_INFO=0 to the process environment or .rootrc

Copy link
Contributor

@martamaja10 martamaja10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Vincenzo!

The default Snapshot compression setting has always been 101. Historically, this was done for simplicity reasons and following the principle of least surprise. TTree was the only output format available with Snapshot, so the operation was defaulting to the same value used by TTree. Now that Snapshot supports more than one output format, this reason is less strong than before. It has been established that 505 is a better default compression setting overall, so RDataFrame should follow that. The main disadvantage is that this change is hard to communicate. This commit proposes to introduce an information message being shown once per program execution, and only if the program is calling Snapshot. This message is supposed to help the users detect changes in their output file size with the next development cycle (6.38.x) and should be removed afterwards.
@vepadulano vepadulano force-pushed the rdf-change-snapshot-comp-settings branch from 7c5c04f to 09afe35 Compare October 6, 2025 17:31
@vepadulano
Copy link
Member Author

I have found a way to show kInfo level messages with RLogger, which I prefer over using the legacy TError.h Info function. The environment variable ROOT_RDF_SILENCE_SNAPSHOT_INFO=1 can be used to suppress the message on the user side. I would avoid adding another knob to .rootrc.

The remaining discussion point is about how long this info message should stay around. I believe it should only stay in 6.38 as leaving it in an LTS release has the concrete risk of sticking around in user code for many years.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants