-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
[Optimization] use a pool to reuse LogicalTokenBlock.token_ids #5584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,12 +1,43 @@ | ||
| """Token blocks.""" | ||
| from typing import List | ||
| import weakref | ||
| from collections import defaultdict | ||
| from typing import Dict, List | ||
|
|
||
| from vllm.utils import Device | ||
|
|
||
| _BLANK_TOKEN_ID = -1 | ||
|
|
||
| DEFAULT_LAST_ACCESSED_TIME = -1 | ||
|
|
||
| TokensBlock = List[int] | ||
|
|
||
|
|
||
| class BlockPool: | ||
| """A pool of physical blocks. | ||
youkaichao marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| When requests come, we create a lot of logical blocks; | ||
| when requests are done, we destroy a lot of logical blocks. | ||
| It turns out that creating and destroying logical blocks can be expensive, | ||
| especially for the `token_ids` field, which is a list of integers. | ||
| To avoid this overhead, we use a pool to manage the logical blocks. | ||
| When an old request is done and a new request comes, we can reuse the | ||
| logical blocks from the old request to feed the new request. | ||
| """ | ||
|
|
||
| def __init__(self) -> None: | ||
| # block size to list of token blocks | ||
| self.pool: Dict[int, List[TokensBlock]] = defaultdict(list) | ||
|
|
||
| def alloc_block(self, block_size: int) -> TokensBlock: | ||
| if block_size in self.pool and self.pool[block_size]: | ||
| return self.pool[block_size].pop() | ||
| return [_BLANK_TOKEN_ID] * block_size | ||
|
|
||
| def del_block(self, block: TokensBlock) -> None: | ||
| self.pool[len(block)].append(block) | ||
|
|
||
|
|
||
| _BLOCK_POOL = BlockPool() | ||
|
|
||
|
|
||
| class LogicalTokenBlock: | ||
| """A block that stores a contiguous chunk of tokens from left to right. | ||
|
|
@@ -23,7 +54,13 @@ def __init__( | |
| self.block_number = block_number | ||
| self.block_size = block_size | ||
|
|
||
| self.token_ids = [_BLANK_TOKEN_ID] * block_size | ||
| self.token_ids = _BLOCK_POOL.alloc_block(block_size) | ||
| # this finalizer is used to return the block to the pool when the object is deleted # noqa | ||
| # NOTE: don't use __del__ because it cannot guarantee the order of finalization, # noqa | ||
| # i.e. `self.token_ids` may be deleted before `self`, and we lose | ||
| # the opportunity to return the block to the pool | ||
| self._finalizer = weakref.finalize(self, _BLOCK_POOL.del_block, | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Although not as "automatic", it may be more efficient to call a free method explicitly when a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
adding a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I haven't checked but maybe there aren't that many places? In any case it doesn't matter if some place is missed. Either it can be dropped in that case or keep the existing logic to let the finalizer get it... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't have the bandwidth to check it yet. Feel free to add it later if you figure it out. My intuition is this would need control over the gc system (which is difficult in Python). |
||
| self.token_ids) | ||
| self.num_tokens = 0 | ||
|
|
||
| def is_empty(self) -> bool: | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.