-
Notifications
You must be signed in to change notification settings - Fork 31.2k
Refactoring AssistedCandidateGenerator for Improved Modularity and Reusability
#35009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@zucchini-nlp feel free to review it :-) |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for making the code more composable! LGTM as nothing changed in terms of functionality. Just left one comment as we had several PRs in parallel modifying the assisted generation code :)
a120dbf to
d4ff091
Compare
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @keyboardAnt ! LGTM as nothing was changed except for composing code into smaller functions.
The only question is the use of self.prev_target_ids which I've mentioned was removed in prev PR explaining it is not needed. This is what I got from reviewing the prev PR
hmm, so to make sure, that means the prev impl when we checked prev_target_ids was not really correct? And we should check the length of already accepted input_ids
yes
So do we need to save prev token ids from target model or we can re-use the current token ids, because the current token ids in any case will have the prev token ids as prefix with new accepted tokens appended at the end
For UAG, we need just the number of tokens in |
Ah that makes sense if we don't care about the actual token ids used previously, because the tokens should be available without storing them. Then we can indeed store only the prev length |
4745dee to
fcd129f
Compare
Thanks, @zucchini-nlp. I've replaced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keyboardAnt
please apply the same fix as at keyboardAnt#4.
else SD will not work when target and assistant are not on the same device
@keyboardAnt |
|
You can now request review from the core maintainer to merge this :) |
|
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks for the detailed explanations 🤗
Merging!
|
Ah can you resolve conflicts? 🤗 |
@ArthurZucker done! |
|
@ArthurZucker @zucchini-nlp |
|
thanks, merging |

What does this PR do?
This PR refactors the
AssistedCandidateGeneratorandAssistedCandidateGeneratorDifferentTokenizersclasses to improve code readability, maintainability, and reusability. By breaking down large methods into smaller helper functions and reusing common logic, we enhance the modularity of these classes. This refactoring lays a cleaner foundation for upcoming features likeUniversalSpeculativeDecodingGeneratorwithout introducing any new functionality.Background
While working on Universal Speculative Decoding (#34760), which introduces the
UniversalSpeculativeDecodingGenerator, we identified opportunities to refactor existing code. The goal is to reuse core logic across different candidate generators and simplify the integration of new features that enable speculative decoding across models with different tokenizers.By submitting this refactoring as a separate PR, we aim to:
This refactor is a collaboration with @jmamou, who has already reviewed it (keyboardAnt#1).
Key Changes
1. Code Restructuring
get_candidatesmethods in both classes into smaller, focused helper functions._calculate_new_tokens_update_past_and_masks_prepare_generation_args_generate_candidates__init__methods to remove redundancy and enhance clarity.2. Improved Reusability
AssistedCandidateGeneratorDifferentTokenizers, methods like_prepare_assistant_input_idsand_process_assistant_outputshandle tokenizer-specific logic.3. Enhanced Readability
Motivation
This refactoring is motivated by the need to:
By isolating these changes, we enable reviewers to focus solely on the structural improvements without the added complexity of new features. This approach helps maintain a high code quality standard and simplifies the review and merging process.
Dependencies
Before Submitting
CandidateGenerator]] #34760).Who Can Review?
The following reviewers are well-suited to review this PR: @gante, @ArthurZucker
This PR aims to strengthen the foundation for speculative decoding and other future enhancements by improving the existing code's structure and maintainability. We appreciate your time and look forward to your feedback.