-
Notifications
You must be signed in to change notification settings - Fork 133
feat: add support for RNTuples #1311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for RNTuples #1311
Conversation
|
Oh, cool! Thanks for starting this! |
|
Are we in a spot where we can start to wrap up this functionality @ariostas? |
|
I think so. I was told that there was some major-ish refactoring underway and that I should wait. If that's done, I'm happy to get back to looking into this. Although I'll probably wait a couple weeks to see what happens on the CMSSW side. |
|
Quick update on this. After cms-sw/cmssw#48071 we can now produce RNTuple NanoAODs with CMSSW. The format is not final, but we can probably start playing around with it. |
|
@ariostas do you have anything further you'd like to do with this PR, shall we mark it ready for review? It needs a rebase as well, there are some conflicts. |
|
I'll spend some time on this next week. It probably mostly works, but I might need to make some changes on the Uproot side. |
|
@ariostas OK - great, thanks! Please keep me informed! |
6ba606c to
ffcea34
Compare
|
Quick update on this. Now that coffea switched to virtual arrays by default, it seems like we don't need any changes on the Uproot side. I got it to read the form correctly, but it is failing to materialize the arrays. I'm guessing that it's because it's assuming that all fields are at the top level, but I'll have to look into it. |
|
Awesome - thanks for coming back to this. |
|
Another quick update on this. The ROOT team agreed to add the RNTuple extension that I proposed, so we will be able to have neatly organized NanoAOD RNTuples. It will take a while until we see them, but we will eventually. As for this PR, next week I'll focus on finishing it so that at least the NanoAODs produced with the RNTuple converter work. |
|
Hi @ariostas - it has been two weeks, any word? Thanks! |
|
Not yet, sorry. I find the Coffea code a little confusing, so I'll try to have a chat with Iason to figure out what's needed. In the meantime, I've been working on adding the option of reading virtual arrays directly in Uproot, so hopefully that will simplify things. |
|
@lgray We found with @ariostas that hopefully the issue is that |
|
Thanks so much, @ikrommyd! Surprisingly, even |
|
Now it works for PHYSLITE too. So I think now it's just a matter of doing a bit of cleanup and maybe adding some more tests? |
|
@ariostas Not quite.. events = NanoEventsFactory.from_root({"tests/samples/nano_dimuon_rntuple.root": "Events"}, schemaclass=BaseSchema, mode="eager").events()is giving me |
|
|
|
@ariostas I added this print and I'm getting this out in the rntuple case diff --git a/src/coffea/nanoevents/mapping/uproot.py b/src/coffea/nanoevents/mapping/uproot.py
index 58963d6e..f833bb9f 100644
--- a/src/coffea/nanoevents/mapping/uproot.py
+++ b/src/coffea/nanoevents/mapping/uproot.py
@@ -258,6 +258,7 @@ class UprootSourceMapping(BaseSourceMapping):
decompression_executor=self.decompression_executor,
interpretation_executor=self.interpretation_executor,
)
+ print(the_array)
if isinstance(the_array.layout, awkward.contents.ListOffsetArray):
the_array = awkward.Array(the_array.layout.to_ListOffsetArray64(True))How is |
|
Ah okay, yeah. I'll have to look into the other modes. Dask probably doesn't work either. |
|
Thanks @ariostas. Seems good at least for base schema and nanoaod schema In [22]: ttree = NanoEventsFactory.from_root({"tests/samples/nano_dimuon.root": "Events"}, mode="eager", schemaclass=BaseSchema).events()
In [23]: rntuple = NanoEventsFactory.from_root({"tests/samples/nano_dimuon_rntuple.root": "Events"}, mode="eager", schemaclass=BaseSchema).events()
In [24]: ak.array_equal(rntuple, ttree, dtype_exact=False, check_parameters=False)
Out[24]: True
In [25]: rntuple = NanoEventsFactory.from_root({"tests/samples/nano_dimuon_rntuple.root": "Events"}, mode="eager", schemaclass=NanoAODSchema).events()
/Users/iason/Dropbox/work/coffea_dev/coffea/src/coffea/nanoevents/schemas/nanoaod.py:264: RuntimeWarning: Missing cross-reference index for LowPtElectron_electronIdx => Electron
warnings.warn(
/Users/iason/Dropbox/work/coffea_dev/coffea/src/coffea/nanoevents/schemas/nanoaod.py:264: RuntimeWarning: Missing cross-reference index for LowPtElectron_photonIdx => Photon
warnings.warn(
In [27]: ttree = NanoEventsFactory.from_root({"tests/samples/nano_dimuon.root": "Events"}, mode="eager", schemaclass=NanoAODSchema).events()
/Users/iason/Dropbox/work/coffea_dev/coffea/src/coffea/nanoevents/schemas/nanoaod.py:264: RuntimeWarning: Missing cross-reference index for LowPtElectron_electronIdx => Electron
warnings.warn(
/Users/iason/Dropbox/work/coffea_dev/coffea/src/coffea/nanoevents/schemas/nanoaod.py:264: RuntimeWarning: Missing cross-reference index for LowPtElectron_photonIdx => Photon
warnings.warn(
In [28]: ak.array_equal(rntuple, ttree, dtype_exact=False, check_parameters=False)
Out[28]: TrueDo you happen to know where the dtype mismatch comes from? Is it from the file conversion? Also, is it possible to make the parameters match? |
|
Both the dtype and parameter mismatches are due to the RNTuple Importer. Seems like it's always converting the offsets (e.g. Also, for some reason it's not keeping the TTree title as the RNTuple description. But anyway, since these are just artifacts of the files being converted, they won't be an issue once we have "real" generated RNTuples. |
Yeah that's totally fine. I was just wondering if it's from the conversion or the reading. If it's from the conversion, I'm happy. |
|
@ariostas Since you have the setup ready, is there any chance you could push rntuple versions of the other root files we use in tests? I would like to the test the different schemas. |
|
I couldn't add samples for other schemas because the RNTupleImporter couldn't convert them since some classes didn't have dictionaries. This can be solved by including the right headers. But for example for PHYSLITE we should just use real samples instead of converted ones. I'll look into trimming the one mentioned in #1436. |
|
Sure. What makes our lives easier if we can use the existing files (but converted) is that we have an equality comparison right then and there and also some hard-coded values in the tests I believe. If you add new files, It's generally good to have a TTree version of the new files too to compare them to if that's possible. |
|
@ariostas I have added some testing which works fine for the nano samples but fails for treemaker. The files appear different Do you know why that is? Should I just test for the common fields one by one instead of asserting that the whole arrays are equal? |
|
Seems like it's just that subbranches/subfields are handled a bit different. For TTrees, you can access directly |
Is it an issue in how the arrays are being structured in uproot or is it inherent to the file? Just so I can know what to test for. |
|
It's on the Uproot side. For RNTuples, I'm mirroring how Awkward does things since the file format itself is pretty much equivalent to Awkward layouts. So it makes a lot more sense to have |
Yeah I agree. So the answer is it will stay like that for RNTuple and not change for TTree. Okay in that case I can assert equality between a few fields that are common. |
|
I think I got the treemaker with baseschema test passing but because of this different structure of the treemaker rntuple, the treemaker schema cannot be tested. It just gives incorrect structure. That's out of scope of this PR though. It should be updated separately to comply to the RNTuple structure. |
|
I wouldn't like to wait longer on this PR. I think it currently shows that RNTuple reading worsks fine for baseschema and when the structure of the fields does not change when converting from TTree->RNTuple. If it does, schema updates are needed which is out of the scope of this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see anything harmful here. We can merge this to have some preliminary support and we can test more as more rntuple test files come in. LGTM!
|
Looks ok, sorry for being late! |
This is still a work in progress, but I'm opening this PR so I can ask questions about my changes, since I'm not familiar with how Coffea works.
This PR depends on scikit-hep/uproot5#1419