implement par_iter_many and par_iter_many_unique #17815

Victoronz · 2025-02-12T05:06:57Z

Objective

Continuation of #16547.

We do not yet have parallel versions of par_iter_many and par_iter_many_unique. It is currently very painful to try and use parallel iteration over entity lists. Even if a list is not long, each operation might still be very expensive, and worth parallelizing.
Plus, it has been requested several times!

Solution

Once again, we implement what we lack!

These parallel iterators collect their input entity list into a Vec/UniqueEntityVec, then chunk that over the available threads, inspired by the original par_iter.

Since no order guarantee is given to the caller, we could sort the input list according to EntityLocation, but that would likely only be worth it for very large entity lists.

There is some duplication which could likely be improved, but I'd like to leave that for a follow-up.

Testing

The doc tests on for_each_init of QueryParManyIter and QueryParManyUniqueIter.

chescock · 2025-02-12T14:23:58Z

crates/bevy_ecs/src/query/par_iter.rs

+pub struct QueryParManyIter<'w, 's, D: QueryData, F: QueryFilter, E: EntityBorrow> {
+    pub(crate) world: UnsafeWorldCell<'w>,
+    pub(crate) state: &'s QueryState<D, F>,
+    pub(crate) entity_list: Vec<E>,


Could this be generalized to EntityList: AsRef<[E]>? (And similarly with AsRef<UniqueEntitySlice<E>> for QueryParManyUniqueIter.) That would let you use it for borrowed slices without needing to pass ownership of the Vec, as well as allowing some other collections like SmallVec.

This isn't part of the public API, this is a Vec that input iterators are collected into because storing an iterator or some other non-owned type here would make the lifetimes and bounds very complicated.
The reason we specifically use Vec here is to take advantage of an optimization that exists in Rust; When Vec::into_iter is collected into a Vec again, it becomes practically a no-op.

Right, I think I'm asking why it does a collect() at all and doesn't just keep the original entities: EntityList like the non-Par versions.

I guess that's because the implementation requires a slice, but you want to accept an EntitySet that isn't a slice?

Are we worried that the collect() is a performance risk? Even if you have a Vec, you might want to only pass a reference to it, so that you can use it later. And going from a &Vec to a Vec isn't a no-op. Parallel iteration is mainly useful when you have a large list of entities, so that might be an expensive Vec to copy!

It is an expense we'd rather not have, however not doing it would make these methods very complex, which shouldn't need to be done in the first PR imo.
By collecting the input iterator into a Vec we ensure that we know the exact length the list has so we can chunk most effectively.
By owning the type we also do not need to worry about splitting up the same iterator across several threads, which would require several Send/Sync bounds on the iterator and its items.
I did try, but couldn't get it to work, so I left it for later.

Oh, I wasn't proposing trying to split up an arbitrary Iterator! I agree that would be too complex.

I'm proposing pub fn par_iter_many<E: EntityBorrow, EntityList: AsRef<[E]>>, and then storing the entities: EntityList directly. So, you could only pass an owned or borrowed slice, and not an arbitrary Iterator.

If you had some other Iterator then you'd have to collect() it yourself. That's less convenient, but it makes the cost more visible.

But in return, you could pass &Vec<Entity> without consuming or copying the Vec! And you could still pass an owned Vec if you needed a QueryParManyIter that didn't borrow from anything.

crates/bevy_ecs/src/query/state.rs

chescock · 2025-02-12T14:33:40Z

crates/bevy_ecs/src/query/state.rs

+                    #[cfg(feature = "trace")]
+                    let _span = self.par_iter_span.enter();
+                    let accum = init_accum();
+                    self.iter_many_unique_unchecked_manual(batch, world, last_run, this_run)


I think we want SAFETY comments here (and in the non-unique version) for when we turn on unsafe_op_in_unsafe_fn.

Hmm, we can't actually write proper safety comments here right now I just noticed, because the original par_iter is itself unsound: #10752
Can we punt on this, and fix all par_iter methods at once?

crates/bevy_ecs/src/query/par_iter.rs

crates/bevy_ecs/src/system/query.rs

cBournhonesque · 2025-02-13T00:28:26Z

crates/bevy_ecs/src/query/par_iter.rs

+///
+/// [`Entity`]: crate::entity::Entity
+/// [`Query::par_iter_many`]: crate::system::Query::par_iter_many
+pub struct QueryParManyIter<'w, 's, D: QueryData, F: QueryFilter, E: EntityBorrow> {


Out of curiosity, why not put a ReadOnlyQueryData bound here?
Because it's good practice to only put bounds on impl instead of on the struct itself?

crates/bevy_ecs/src/query/par_iter.rs

crates/bevy_ecs/src/query/state.rs

cBournhonesque

Amazing to see the results of the EntitySet work!

Carter0 · 2025-02-13T01:57:28Z

crates/bevy_ecs/src/query/par_iter.rs

+    ///
+    /// `init` may be called multiple times per thread, and the values returned may be discarded between tasks on any given thread.
+    /// Callers should avoid using this function as if it were a parallel version
+    /// of [`Iterator::fold`].


What do you mean by this? Can you elaborate?

This is has not changed from QueryParIter; for_each_init has similar inputs to Iterator::fold, but must not be mistaken for fold, it doesn't accumulate one overall value. In short, it is still a for_each, but with an additional INIT parameter.

Carter0

Yeah this is a little above me lol. If you added some tests that might help tho

Victoronz · 2025-02-13T02:52:25Z

oops, that was too early!

chescock · 2025-02-13T03:15:30Z

oops, that was too early!

Only because I had forgotten to actually hit the approve button :)

Victoronz added 2 commits February 12, 2025 05:52

implement par_iter_many and par_iter_many_unique

48d82da

use normal par_iter_spans

24c0f0a

Victoronz added 3 commits February 12, 2025 06:13

consistent semicolons

3580d28

missing doc links

d61bc84

the actual missing doc links

bed5652

chescock reviewed Feb 12, 2025

View reviewed changes

use ChunksExact::remainder

2e9b015