- 
                Notifications
    You must be signed in to change notification settings 
- Fork 31k
Simplifying getting the number of embedding tokens #26945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Closed
      
      
            AnastasiyaKukharska
  wants to merge
  12
  commits into
  huggingface:main
from
AnastasiyaKukharska:num-embedding-tokens-simplify
  
      
      
   
      
    
                
     Closed
            
            Simplifying getting the number of embedding tokens #26945
                    AnastasiyaKukharska
  wants to merge
  12
  commits into
  huggingface:main
from
AnastasiyaKukharska:num-embedding-tokens-simplify
  
      
      
   
              
            Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    * fix --------- Co-authored-by: ydshieh <[email protected]>
* fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]>
* Fix backward compatibility of Conversation I ran into a case where an external library was depending on the `new_user_input` field of Conversation. https://github.com/SeldonIO/MLServer/blob/release/1.4.x/runtimes/huggingface/mlserver_huggingface/codecs/utils.py#L37 This field was deprecated as part of the refactor, but if `transformers` wants to maintain backwards compatibility for now (which is mentioned in a few comments) then there's a good argument for supporting it. Some comments referred to it as an "internal" property, but it didn't start with `_` as is Python convention, so I think it's reasonable that other libraries were referencing it directly. It's not difficult to add it to the other supported backwards-compatible properties. In addition, the implementation of `past_user_inputs` didn't actually match the past behavior (it would contain the most recent message as well) so I updated that as well. * make style --------- Co-authored-by: Matt <[email protected]>
* llm prompting guide * updated code examples * an attempt to fix the code example tests * set seed in examples * added a doctest comment * added einops to the doc_test_job * string formatting * string formatting, again * added the toc to slow_documentation_tests.txt * minor list fix * string formatting + pipe renamed * Apply suggestions from code review Co-authored-by: Patrick von Platen <[email protected]> * replaced max_length with max_new_tokens and updated the outputs to match * minor formatting fix * removed einops from circleci config * Apply suggestions from code review Co-authored-by: Lysandre Debut <[email protected]> * removed einops and trust_remote_code parameter --------- Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Lysandre Debut <[email protected]>
* Remove UniSpeechConfig * Remove , at the end otherwise check_docstring changes order * Auto add new docstring * Update docstring for UniSpeechConfig * Remove from check_docstrings * Remove UniSpeechSatConfig and UniSpeechSatForCTC from check_docstrings * Remove , at the end * Fix docstring * Update docstring for Wav2Vec2ForCTC * Update Wav2Vec2ForCTC docstring Co-authored-by: Yih-Dar <[email protected]> * fix style --------- Co-authored-by: Yih-Dar <[email protected]>
* [DOCS] Update docstrings for and tokenizer * [DOCS] add pad_token argument to whisper tokenizer docstring * [FIX] Reword pad_token description * [CHORE] Apply style formatting --------- Co-authored-by: jmcdonnell <[email protected]>
* [docstring] Remove 'BertGenerationConfig' from OBJECTS_TO_IGNORE * [docstring] Fix docstring for 'BertGenerationConfig' (#26638)
fix Co-authored-by: ydshieh <[email protected]>
Skip `TrainerIntegrationFSDP::test_basic_run_with_cpu_offload` if `torch < 2.1` (#26764) * fix * fix --------- Co-authored-by: ydshieh <[email protected]>
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Here I simplify getting the number of embedding tokens according to discussions on PR 26024