Skip to content

Conversation

@jmamou
Copy link
Collaborator

@jmamou jmamou commented Jan 9, 2025

The token added by the target during the validation step is decoded as a string, which the draft always interprets as the beginning of a new word.

@jmamou jmamou force-pushed the fix_prepare_assistant_input_ids branch from 3c5f39c to d20c488 Compare January 12, 2025 13:16
Copy link
Owner

@keyboardAnt keyboardAnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few comments. My primary concern is merging untested changes, as they will be automatically added to our open PR to upstream. Has this branch been tested? What are the expected speedups we will achieve after merging it into the existing code?

@jmamou
Copy link
Collaborator Author

jmamou commented Jan 12, 2025

I added a few comments. My primary concern is merging untested changes, as they will be automatically added to our open PR to upstream. Has this branch been tested? What are the expected speedups we will achieve after merging it into the existing code?

It has been tested, need to add relevant automatic tests.

Here is the impact on model pairs using different space signs comparing with previous benchmark
microsoft/Phi-3-medium-128k-instruct | Qwen/Qwen2-0.5B-Instruct 0.63x -> 0.78x
codellama/CodeLlama-13b-Instruct-hf | bigcode/tiny_starcoder_py 0.74x -> 0.82x

My primary question was about the implementation. WDYT about it?

Copy link
Owner

@keyboardAnt keyboardAnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@keyboardAnt keyboardAnt self-requested a review January 12, 2025 15:16
Copy link
Owner

@keyboardAnt keyboardAnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@keyboardAnt keyboardAnt merged commit a556947 into usd Jan 12, 2025
@keyboardAnt keyboardAnt deleted the fix_prepare_assistant_input_ids branch January 12, 2025 15:25
@keyboardAnt
Copy link
Owner

@jmamou, it seems like the CI tests fail:
image

@jmamou
Copy link
Collaborator Author

jmamou commented Jan 12, 2025

it seems that the merge occurred before the last commit that renamed target_to_assistant_input_id to target_to_assistant_input_ids.

keyboardAnt added a commit that referenced this pull request Feb 28, 2025
* move `TestAssistedCandidateGeneratorDifferentTokenizers` into a new testing file

* refactor

* NOTHING. add space to rerun github actions tests

* remove it...

* `UniversalSpeculativeDecodingGenerator`

* Use `UniversalSpeculativeDecodingGenerator` when `generation_config.do_sample=True`

* assistant tokenizes only the target's new suffix

* formatting

* fix code

* fix code

* formatting

* add `TestGenerateWithDifferentModels`

* `TestGenerateWithDifferentModels` parameterize on `do_sample`

* `AssistantVocabMapping` & `AssistantVocabMappingCache`

* formatting

* `AssistantToTargetTranslator`: `get_target_input_ids` & `get_target_logits`

* improve `_get_assistant_to_target_input_ids` & formatting

* renaming

* WIP: debugging `min_new_tokens`

* fix get_target_ids

* `UniversalSpeculativeDecodingGenerator`

* assistant tokenizes only the target's new suffix

* formatting

* fix code

* fix code

* formatting

* `TestGenerateWithDifferentModels` parameterize on `do_sample`

* `AssistantVocabMapping` & `AssistantVocabMappingCache`

* formatting

* `AssistantToTargetTranslator`: `get_target_input_ids` & `get_target_logits`

* improve `_get_assistant_to_target_input_ids` & formatting

* renaming

* WIP: debugging `min_new_tokens`

* fix get_target_ids

* fix device issue

* fix get_assistant_input_ids

* add `TestAssistedCandidateGeneratorDifferentTokenizers`

* formatting

* `AssistantVocabTranslatorCache` refactor & tests

* revert changes in `src/transformers/generation/logits_process.py`

* refactor `AssistedCandidateGenerator`

* refactor `AssistedCandidateGeneratorDifferentTokenizers`

* formatting

* refactor `UniversalSpeculativeDecodingGenerator`

* fix negative value for max_new_tokens

* fix generation length target + attention_mask vs. assistant + attent

* fix device

* fix negative max_new_tokens bug

* fix UAG

* minor

* formatting

* `AssistedCandidateGeneratorDifferentTokenizers` `lookbehind`s init

* resolve conflict & formatting

* rerun CI tests

* remove space...

* remove old code

* fix candidate_input_ids device

* minor

* formatting

* Fix prepare + apply (#7)

* fix prepare + apply

* move to cpu

* simplity suppress_tokens

* fix bugs and refacatoring

* device move

* handle self.config.vocab_size > len(target_tokenizer.get_vocab())

* no need to normalize in candidate_generator

* address Nadav's comments + minor

* optimize device move + SuppressTokensLogitsProcessor

* AssistantToTargetTranslator, SuppressTokensLogitsProcessor and tokenizers mapping improvements

* padding size

* padding improvement

* fix and simplify get_target_logits

* renaming in get_target_logits

* minor

* add filter_value and suppress_tokens_id

* style + rename

* remove TODO

* restore original SelectTokensLogitsProcessor with modification

* fix style

* fix _update_past_and_masks and optimize code

* remove assistant_vocab_size arg

* fix attention_mask

* call _prepare_attention_mask also if not has_past_key_values

* handling attention mask for first generation

* comment

* restore test

* remove SelectTokensLogitsProcessor

* _update_past_and_masks implementation for USD

* Add unittests for Universal Assisted generation

* fix style

* update tests

* Remove unused import and fix `test_speculation_depth` test

* exclude special and reserved tokens from tokenizer for UAG

* mv `test_universal_assisted_generation.py` to `generation/test_candidate_generator.py`

* Remove unused imports and fix style using `make style` (#9)

* formatting

* Swap gated `meta-llama/llama-3.2` with `allenai/llama` (#10)

* Fix space sign disagreement (#12)

* default values for AssistantToTargetTranslator fileds

* fix space sign

* minor

* fix test + style

* Default values for some fields of assistant to target translator (#11)

* default values for AssistantToTargetTranslator fileds

* fix

* add support to empty logit_processors

* Update candidate_generator.py (#15)

fix typo

* BUG fix in _prepare_assistant_input_ids (#14)

* fix _prepare_assistant_input_ids

* target_to_assistant_input_ids

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Nadav Timor <[email protected]>

---------

Co-authored-by: Nadav Timor <[email protected]>

* typo (`target_to_assistant_input_ids`)

* formatting

* merge upstream/main

* Fix minor review comments (#16)

* Fix: `token_ids.to(torch.int64)` (#18)

* tok ids to `torch.int64` (reference: https://huggingface.co/docs/transformers.js/en/api/tokenizers)

* `LongTensor`

* fix dtype

* `assistant_input_ids.to(dtype=torch.long)`

* Remove unused import from test_candidate_generator.py

* Remove unused import from test_candidate_generator.py

* Remove `numpy` import

* resolve pr comments (#19)

* `AssistantToTargetTranslator` docstring

* (per gante's comment) `filter_value` and `suppress_tokens_id` to class constants

* update `AssistantToTargetTranslator` docstring

* (gante's comment) replace `match-case`

* formatting

* Fix Joao's comments (#21)

* remove threading

* fix logits_processor

* fix test device

* fix style (#23)

* Move atm (#24)

* move AssistantToTargetTranslator

* fixup

* fix logit_processor

* add atm_translator test

* refactor test

* remove threading from test

* add require_torch in tests

* move AssistantVocabTranslatorCache + add tests

* ruff fix

---------

Co-authored-by: jmamou <[email protected]>
Co-authored-by: Gaurav <[email protected]>
Co-authored-by: Gaurav Jain <[email protected]>
Co-authored-by: gauravjain14 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants