Skip to content

Conversation

@pfackeldey
Copy link
Collaborator

This PR adds the recently merged preload feature (PR #1387) to the Runner API. There's a new Runner.trace_needed_columns method that traces each WorkItem using the given executor (that may be noticeably overhead, but it is the most correct thing to do, and definitely less work than streaming/running over actual data). The output is JSON serializable (reusable) and can be passed to the __call__/run method of a Runner through preload_columns:

run = processor.Runner(...)

# this is an optional step
preload_columns = run.trace_needed_columns(
  filelist,
  processor_instance=MyAnalysisProcessor(),
  treename="Events",
)

out = run(
  filelist,
  processor_instance=MyAnalysisProcessor(),
  treename="Events",
  preload_columns=preload_columns, # this is optional
)

Because this run.trace_needed_columns may be a medium-heavy workload I decided to make it an explicit method that one has to opt-in for (similar to the .preprocess that can be explicitly called before run). I'm not sure if this is the best interface, but at least it's a starting point. Let me know what you think.

@lgray
Copy link
Collaborator

lgray commented Oct 10, 2025

@pfackeldey could you fix the conflicts here and we can probably get this merged.

@ikrommyd
Copy link
Collaborator

ikrommyd commented Oct 10, 2025

@pfackeldey could you fix the conflicts here and we can probably get this merged.

We ve discussed this with Peter over the past few days and the API is not quite there yet. It works but it doesn’t feel like the API we want. We think (also according to retreat discussions) that it’s probably the right time to overhaul the pre processing part and unify the interfaces to have one preprocessing and remove this concept from the runners. This would then go into the new preprocessing.

@ikrommyd ikrommyd marked this pull request as draft October 14, 2025 13:00
@pfackeldey
Copy link
Collaborator Author

@ikrommyd do you have already any plans/ideas how the updated preprocessing should look like?

@lgray
Copy link
Collaborator

lgray commented Oct 22, 2025

Yes we discussed this yesterday, so I think he'll have something here shortly.

@pfackeldey
Copy link
Collaborator Author

Yes we discussed this yesterday, so I think he'll have something here shortly.

Awesome! I'm happy to update this PR to follow the new preprocessing :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants