Skip to content

Conversation

@amogkam
Copy link
Contributor

@amogkam amogkam commented Apr 4, 2023

ActorPoolMapOperator takes in a Callable class which initializes some state to be reused for every batch.

In the current implementation, this state is initialized on the first batch, rather than during actor init.

In this PR, we separate the state initialization and actually call it during Actor init. This allows state to be initialized for fixed size actor pools, even when tasks are not ready to be dispatched for better pipelining. It also supports using multithreaded actors, so state gets initialized once per actor instead of once per thread.

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

amogkam added 2 commits April 3, 2023 20:06
Signed-off-by: amogkam <[email protected]>
Signed-off-by: amogkam <[email protected]>
Signed-off-by: amogkam <[email protected]>
@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Apr 4, 2023
@jianoaix
Copy link
Contributor

jianoaix commented Apr 5, 2023

Is the existing approach really once per thread? It looks _cached_fn is per process (hence actor).

@amogkam
Copy link
Contributor Author

amogkam commented Apr 5, 2023

Right, but race conditions can lead to each thread initializing the callable class. We can either add synchronization (move the class initialization behind a lock) or explicitly do the class initialization in a separate step.

amogkam added 3 commits April 5, 2023 15:38
Signed-off-by: amogkam <[email protected]>
Signed-off-by: amogkam <[email protected]>
Signed-off-by: amogkam <[email protected]>
amogkam added 2 commits April 6, 2023 20:50
Signed-off-by: amogkam <[email protected]>
Signed-off-by: amogkam <[email protected]>
@amogkam amogkam merged commit aaac9cd into ray-project:master Apr 10, 2023
@amogkam amogkam deleted the dataset-fix-state-init branch April 10, 2023 20:23
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
…roject#34037)

ActorPoolMapOperator takes in a Callable class which initializes some state to be reused for every batch.

In the current implementation, this state is initialized on the first batch, rather than during actor init.

In this PR, we separate the state initialization and actually call it during Actor init. This allows state to be initialized for fixed size actor pools, even when tasks are not ready to be dispatched for better pipelining. It also supports using multithreaded actors, so state gets initialized once per actor instead of once per thread.

---------

Signed-off-by: amogkam <[email protected]>
Signed-off-by: elliottower <[email protected]>
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023
…roject#34037)

ActorPoolMapOperator takes in a Callable class which initializes some state to be reused for every batch.

In the current implementation, this state is initialized on the first batch, rather than during actor init.

In this PR, we separate the state initialization and actually call it during Actor init. This allows state to be initialized for fixed size actor pools, even when tasks are not ready to be dispatched for better pipelining. It also supports using multithreaded actors, so state gets initialized once per actor instead of once per thread.

---------

Signed-off-by: amogkam <[email protected]>
Signed-off-by: Jack He <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants