Skip to content

Conversation

youkaichao
Copy link
Member

This PR adds coordinator, the last piece in #3587 . The main benefit is that all distributed stuff (e.g. complicated selection logic in communication ops, prev/next rank logic, initialization&destruction logic) can be shared for all tp/pp code.

After this PR, we explicitly require the code writer specify which group it wants to operate on:

  • world group, by get_world()
  • tp group, by get_tp()
  • pp group, by get_pp()

After that, they can just call all_reduce, prev_rank ... without any further changes.

@youkaichao youkaichao requested a review from zhuohan123 June 5, 2024 21:05
Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM! Left some style comments.

_WORLD: Optional[GroupCoordinator] = None


def get_world() -> GroupCoordinator:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the following changes? Make the function name easier to understand:

  • get_world -> get_world_group?
  • get_tp -> get_tp_group?
  • get_pp -> get_pp_group?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in ef78f38ef78f38

@youkaichao youkaichao merged commit ea3890a into vllm-project:main Jun 13, 2024
@youkaichao youkaichao deleted the coordinator_impl branch June 13, 2024 00:27
@wooyeonlee0
Copy link
Contributor

@youkaichao Great! I love this refactoring PR 👍
My code in #5414 can be simpler now :)

robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request Jun 16, 2024
…-project#5293)

[Core][Distributed] add coordinator to reduce code duplication in tp and pp (vllm-project#5293)
joerunde pushed a commit to joerunde/vllm that referenced this pull request Jun 17, 2024
…-project#5293)

[Core][Distributed] add coordinator to reduce code duplication in tp and pp (vllm-project#5293)
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 27, 2024
…-project#5293)

[Core][Distributed] add coordinator to reduce code duplication in tp and pp (vllm-project#5293)
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 8, 2024
…-project#5293)

[Core][Distributed] add coordinator to reduce code duplication in tp and pp (vllm-project#5293)
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024
…-project#5293)

[Core][Distributed] add coordinator to reduce code duplication in tp and pp (vllm-project#5293)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants