Skip to content

Conversation

antiguru
Copy link
Member

@antiguru antiguru commented Sep 6, 2024

This PR aims at removing temporary allocations by:

  • Exposing a better addr function, i.e, addr_for that allocates the correct size.
  • Passing Vec<usize> instead of &[usize] whenever the receiver needs to take ownership. In this case, the caller can determine whether they can pass ownership or need to clone, instead of unconditionally forcing the receiver to allocating a new vector to take ownership.

P: ParallelizationContract<G::Timestamp, C> {

let connection = vec![Antichain::from_elem(Default::default()); self.builder.shape().outputs()];
let connection = (0..self.builder.shape().outputs()).map(|_| Antichain::from_elem(Default::default())).collect();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for future selves, this is a common enough pattern (antichain with default summary) that it probably wants to be represented more compactly, perhaps as an enum variant. It seems that such enums around vectors are surprisingly cheap now (more than just the one additional variant allowed by e.g. Option<Vec<_>>).

fn name(&self) -> String { self.subgraph.borrow().name.clone() }
fn addr(&self) -> Vec<usize> { self.subgraph.borrow().path.clone() }

fn addr_for(&self, index: usize) -> Vec<usize> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should eventually add a comment here, or enrich the name to addr_for_child to be clear. Or addr_plus or .. idk something to be clear what is going on here. It's not a great mystery once you read the code, ofc. Don't worry about this if you don't want, and I can tidy in post.

Comment on lines 36 to 38
/// A sequence of scope identifiers describing the path from the worker root to the index in
/// this scope.
fn addr_for(&self, index: usize) -> Vec<usize>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh boy. You did have the docs and I read the files out of order. I'm sorry! But maybe "to the index" -> "to the child indicated by index"?

Copy link
Member

@frankmcsherry frankmcsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I left some notes, but they are all minor.

Another thing we discussed offline, and I wanted to record here, is that perhaps we should be using Rc<[usize]> for the addresses, rather than Vec<usize>. We don't need to mutate anything, and there are moments that we have multiple copies of the same thing. Another potentially useful idea is to have the worker support a per-dataflow allocation for storing addresses, as they all have the same shared lifetime (the dataflow), and they never change. So, e.g. a map from operator id (integer) back to the path (integer sequence).

Anyhow, this seems great independently, and these ideas are potentially good follow-ons!

Signed-off-by: Moritz Hoffmann <[email protected]>
@antiguru antiguru force-pushed the temporary_allocations branch from 5874de0 to c11e2e3 Compare September 6, 2024 18:12
Signed-off-by: Moritz Hoffmann <[email protected]>
@antiguru antiguru merged commit adbaf2d into TimelyDataflow:master Sep 6, 2024
1 check passed
@antiguru antiguru deleted the temporary_allocations branch September 6, 2024 18:15
@github-actions github-actions bot mentioned this pull request Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants