Simplifying getting the number of embedding tokens #26945

AnastasiyaKukharska · 2023-10-19T18:52:44Z

Here I simplify getting the number of embedding tokens according to discussions on PR 26024

* fix --------- Co-authored-by: ydshieh <[email protected]>

* fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]>

* Fix backward compatibility of Conversation I ran into a case where an external library was depending on the `new_user_input` field of Conversation. https://github.com/SeldonIO/MLServer/blob/release/1.4.x/runtimes/huggingface/mlserver_huggingface/codecs/utils.py#L37 This field was deprecated as part of the refactor, but if `transformers` wants to maintain backwards compatibility for now (which is mentioned in a few comments) then there's a good argument for supporting it. Some comments referred to it as an "internal" property, but it didn't start with `_` as is Python convention, so I think it's reasonable that other libraries were referencing it directly. It's not difficult to add it to the other supported backwards-compatible properties. In addition, the implementation of `past_user_inputs` didn't actually match the past behavior (it would contain the most recent message as well) so I updated that as well. * make style --------- Co-authored-by: Matt <[email protected]>

* llm prompting guide * updated code examples * an attempt to fix the code example tests * set seed in examples * added a doctest comment * added einops to the doc_test_job * string formatting * string formatting, again * added the toc to slow_documentation_tests.txt * minor list fix * string formatting + pipe renamed * Apply suggestions from code review Co-authored-by: Patrick von Platen <[email protected]> * replaced max_length with max_new_tokens and updated the outputs to match * minor formatting fix * removed einops from circleci config * Apply suggestions from code review Co-authored-by: Lysandre Debut <[email protected]> * removed einops and trust_remote_code parameter --------- Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Lysandre Debut <[email protected]>

* Remove UniSpeechConfig * Remove , at the end otherwise check_docstring changes order * Auto add new docstring * Update docstring for UniSpeechConfig * Remove from check_docstrings * Remove UniSpeechSatConfig and UniSpeechSatForCTC from check_docstrings * Remove , at the end * Fix docstring * Update docstring for Wav2Vec2ForCTC * Update Wav2Vec2ForCTC docstring Co-authored-by: Yih-Dar <[email protected]> * fix style --------- Co-authored-by: Yih-Dar <[email protected]>

* [DOCS] Update docstrings for and tokenizer * [DOCS] add pad_token argument to whisper tokenizer docstring * [FIX] Reword pad_token description * [CHORE] Apply style formatting --------- Co-authored-by: jmcdonnell <[email protected]>

* [docstring] Remove 'BertGenerationConfig' from OBJECTS_TO_IGNORE * [docstring] Fix docstring for 'BertGenerationConfig' (#26638)

fix Co-authored-by: ydshieh <[email protected]>

Skip `TrainerIntegrationFSDP::test_basic_run_with_cpu_offload` if `torch < 2.1` (#26764) * fix * fix --------- Co-authored-by: ydshieh <[email protected]>

AnastasiyaKukharska and others added 12 commits October 13, 2023 01:11

'part one

23c01ff

part two

70731c4

Fix PersimmonIntegrationTest OOM (#26750)

ba7958c

* fix --------- Co-authored-by: ydshieh <[email protected]>

Fix MistralIntegrationTest OOM (#26754)

61baa27

* fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]>

[docstring] Update GPT2 and Whisper (#26642)

5680a13

* [DOCS] Update docstrings for and tokenizer * [DOCS] add pad_token argument to whisper tokenizer docstring * [FIX] Reword pad_token description * [CHORE] Apply style formatting --------- Co-authored-by: jmcdonnell <[email protected]>

Changed commit message(#26661)

069e247

* [docstring] Remove 'BertGenerationConfig' from OBJECTS_TO_IGNORE * [docstring] Fix docstring for 'BertGenerationConfig' (#26638)

Fix PerceiverModelIntegrationTest::test_inference_masked_lm (#26760)

7394b4b

fix Co-authored-by: ydshieh <[email protected]>

squished commits

41d3d0f

Skip `TrainerIntegrationFSDP::test_basic_run_with_cpu_offload` if `torch < 2.1` (#26764) * fix * fix --------- Co-authored-by: ydshieh <[email protected]>

simplify adding the number of embedded tokens

bddd5ab

AnastasiyaKukharska closed this Oct 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplifying getting the number of embedding tokens #26945

Simplifying getting the number of embedding tokens #26945

Uh oh!

AnastasiyaKukharska commented Oct 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants