Create DataParallel model if several GPUs #1

VictorSanh · 2018-11-03T14:10:20Z

No description provided.

Adds an example for loading a pre-trained BERT model and fine tune it as a language model (masked tokens & nextSentence) on your target corpus.

Create DataParallel model if several GPUs

Adds an example for loading a pre-trained BERT model and fine tune it as a language model (masked tokens & nextSentence) on your target corpus.

Pulling commits from main repo

Correct a broken link and its context.

Update torchscript.rst

changes in return statement of evaluate function

merege from original repo

roberta, xlnet for multiple choice

update

Merge changes from huggingface/transformers to stevezheng23/transformers

…utput_attentions fix pytorch tests

* gptqmodel Signed-off-by: jiqing-feng <[email protected]> * fix format Signed-off-by: jiqing-feng <[email protected]> * update readme Signed-off-by: jiqing-feng <[email protected]> * gptqmodel need use checkpoint_format (#1) * gptqmodel need use checkpoint_format * fix quantize * Update quantization_config.py * Update quantization_config.py * Update quantization_config.py --------- Co-authored-by: ZX-ModelCloud <[email protected]> Co-authored-by: Qubitium-ModelCloud <[email protected]> * Revert quantizer_gptq.py (#2) * revert quantizer_gptq.py change * pass **kwargs * limit gptqmodel and optimum version Signed-off-by: jiqing-feng <[email protected]> * fix format Signed-off-by: jiqing-feng <[email protected]> * fix warning Signed-off-by: jiqing-feng <[email protected]> * fix version check Signed-off-by: jiqing-feng <[email protected]> * revert unrelated changes Signed-off-by: jiqing-feng <[email protected]> * enable gptqmodel tests Signed-off-by: jiqing-feng <[email protected]> * fix requires gptq Signed-off-by: jiqing-feng <[email protected]> * Fix Transformer compat (#3) * revert quantizer_gptq.py change * pass **kwargs * add meta info * cleanup * cleanup * Update quantization_config.py * hf_select_quant_linear pass checkpoint_format and meta * fix GPTQTestCUDA * Update test_gptq.py * gptqmodel.hf_select_quant_linear() now does not select ExllamaV2 * cleanup * add backend * cleanup * cleanup * no need check exllama version * Update quantization_config.py * lower checkpoint_format and backend * check none * cleanup * Update quantization_config.py * fix self.use_exllama == False * spell * fix unittest * fix unittest --------- Co-authored-by: LRL <[email protected]> Co-authored-by: Qubitium-ModelCloud <[email protected]> * fix format Signed-off-by: jiqing-feng <[email protected]> * fix format again Signed-off-by: jiqing-feng <[email protected]> * update gptqmodel version (#6) * update gptqmodel version * update gptqmodel version * fix unit test (#5) * update gptqmodel version * update gptqmodel version * "not self.use_exllama" is not equivalent to "self.use_exllama==False" * fix unittest * update gptqmodel version * backend is loading_attibutes (#7) * fix format and tests Signed-off-by: jiqing-feng <[email protected]> * fix memory check Signed-off-by: jiqing-feng <[email protected]> * fix device mismatch Signed-off-by: jiqing-feng <[email protected]> * fix result check Signed-off-by: jiqing-feng <[email protected]> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <[email protected]> * update tests Signed-off-by: jiqing-feng <[email protected]> * review: update docs (#10) * review: update docs (#12) * review: update docs * fix typo * update tests for gptqmodel Signed-off-by: jiqing-feng <[email protected]> * update document (#9) * update overview.md * cleanup * Update overview.md * Update overview.md * Update overview.md * update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md --------- Co-authored-by: Qubitium-ModelCloud <[email protected]> * typo * doc note for asymmetric quant * typo with apple silicon(e) * typo for marlin * column name revert: review * doc rocm support * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]> Co-authored-by: LRL-ModelCloud <[email protected]> Co-authored-by: ZX-ModelCloud <[email protected]> Co-authored-by: Qubitium-ModelCloud <[email protected]> Co-authored-by: ZX-ModelCloud <[email protected]> Co-authored-by: LRL <[email protected]> Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Mohamed Mekkouri <[email protected]> Co-authored-by: Steven Liu <[email protected]>

* gptqmodel Signed-off-by: jiqing-feng <[email protected]> * fix format Signed-off-by: jiqing-feng <[email protected]> * update readme Signed-off-by: jiqing-feng <[email protected]> * gptqmodel need use checkpoint_format (huggingface#1) * gptqmodel need use checkpoint_format * fix quantize * Update quantization_config.py * Update quantization_config.py * Update quantization_config.py --------- Co-authored-by: ZX-ModelCloud <[email protected]> Co-authored-by: Qubitium-ModelCloud <[email protected]> * Revert quantizer_gptq.py (huggingface#2) * revert quantizer_gptq.py change * pass **kwargs * limit gptqmodel and optimum version Signed-off-by: jiqing-feng <[email protected]> * fix format Signed-off-by: jiqing-feng <[email protected]> * fix warning Signed-off-by: jiqing-feng <[email protected]> * fix version check Signed-off-by: jiqing-feng <[email protected]> * revert unrelated changes Signed-off-by: jiqing-feng <[email protected]> * enable gptqmodel tests Signed-off-by: jiqing-feng <[email protected]> * fix requires gptq Signed-off-by: jiqing-feng <[email protected]> * Fix Transformer compat (huggingface#3) * revert quantizer_gptq.py change * pass **kwargs * add meta info * cleanup * cleanup * Update quantization_config.py * hf_select_quant_linear pass checkpoint_format and meta * fix GPTQTestCUDA * Update test_gptq.py * gptqmodel.hf_select_quant_linear() now does not select ExllamaV2 * cleanup * add backend * cleanup * cleanup * no need check exllama version * Update quantization_config.py * lower checkpoint_format and backend * check none * cleanup * Update quantization_config.py * fix self.use_exllama == False * spell * fix unittest * fix unittest --------- Co-authored-by: LRL <[email protected]> Co-authored-by: Qubitium-ModelCloud <[email protected]> * fix format Signed-off-by: jiqing-feng <[email protected]> * fix format again Signed-off-by: jiqing-feng <[email protected]> * update gptqmodel version (huggingface#6) * update gptqmodel version * update gptqmodel version * fix unit test (huggingface#5) * update gptqmodel version * update gptqmodel version * "not self.use_exllama" is not equivalent to "self.use_exllama==False" * fix unittest * update gptqmodel version * backend is loading_attibutes (huggingface#7) * fix format and tests Signed-off-by: jiqing-feng <[email protected]> * fix memory check Signed-off-by: jiqing-feng <[email protected]> * fix device mismatch Signed-off-by: jiqing-feng <[email protected]> * fix result check Signed-off-by: jiqing-feng <[email protected]> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <[email protected]> * update tests Signed-off-by: jiqing-feng <[email protected]> * review: update docs (huggingface#10) * review: update docs (huggingface#12) * review: update docs * fix typo * update tests for gptqmodel Signed-off-by: jiqing-feng <[email protected]> * update document (huggingface#9) * update overview.md * cleanup * Update overview.md * Update overview.md * Update overview.md * update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md --------- Co-authored-by: Qubitium-ModelCloud <[email protected]> * typo * doc note for asymmetric quant * typo with apple silicon(e) * typo for marlin * column name revert: review * doc rocm support * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]> Co-authored-by: LRL-ModelCloud <[email protected]> Co-authored-by: ZX-ModelCloud <[email protected]> Co-authored-by: Qubitium-ModelCloud <[email protected]> Co-authored-by: ZX-ModelCloud <[email protected]> Co-authored-by: LRL <[email protected]> Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Mohamed Mekkouri <[email protected]> Co-authored-by: Steven Liu <[email protected]>

* Resolve vptq conflict * Rename spqr package to spqr_quant * Get rid of aqlm mention * Start working on tests * Resolve ruff code checks * Ruff format * Isort * Test updates * Add gpu tag * Rename to modules_to_not_convert * Config update * Docs and config update * Docs and config update * Update to update_torch_dtype * spqr config parameter validation * Ruff update * Apply ruff fixes * Test fixes * Ruff update * Mark tests as @slow again; Ruff; Docstring update * Ruff * Remove absolute path * Resolve typo * Remove redundandt log * Check accelerate/spqr availability * Ruff fix * Check if the config contains proper shapes * Ruff test * Documentation update * overview update * Ruff checks * Ruff code quality * Make style * Update docs/source/en/quantization/spqr.md Co-authored-by: Steven Liu <[email protected]> * Update spqr.md * Enable gptqmodel (#35012) * gptqmodel Signed-off-by: jiqing-feng <[email protected]> * fix format Signed-off-by: jiqing-feng <[email protected]> * update readme Signed-off-by: jiqing-feng <[email protected]> * gptqmodel need use checkpoint_format (#1) * gptqmodel need use checkpoint_format * fix quantize * Update quantization_config.py * Update quantization_config.py * Update quantization_config.py --------- Co-authored-by: ZX-ModelCloud <[email protected]> Co-authored-by: Qubitium-ModelCloud <[email protected]> * Revert quantizer_gptq.py (#2) * revert quantizer_gptq.py change * pass **kwargs * limit gptqmodel and optimum version Signed-off-by: jiqing-feng <[email protected]> * fix format Signed-off-by: jiqing-feng <[email protected]> * fix warning Signed-off-by: jiqing-feng <[email protected]> * fix version check Signed-off-by: jiqing-feng <[email protected]> * revert unrelated changes Signed-off-by: jiqing-feng <[email protected]> * enable gptqmodel tests Signed-off-by: jiqing-feng <[email protected]> * fix requires gptq Signed-off-by: jiqing-feng <[email protected]> * Fix Transformer compat (#3) * revert quantizer_gptq.py change * pass **kwargs * add meta info * cleanup * cleanup * Update quantization_config.py * hf_select_quant_linear pass checkpoint_format and meta * fix GPTQTestCUDA * Update test_gptq.py * gptqmodel.hf_select_quant_linear() now does not select ExllamaV2 * cleanup * add backend * cleanup * cleanup * no need check exllama version * Update quantization_config.py * lower checkpoint_format and backend * check none * cleanup * Update quantization_config.py * fix self.use_exllama == False * spell * fix unittest * fix unittest --------- Co-authored-by: LRL <[email protected]> Co-authored-by: Qubitium-ModelCloud <[email protected]> * fix format Signed-off-by: jiqing-feng <[email protected]> * fix format again Signed-off-by: jiqing-feng <[email protected]> * update gptqmodel version (#6) * update gptqmodel version * update gptqmodel version * fix unit test (#5) * update gptqmodel version * update gptqmodel version * "not self.use_exllama" is not equivalent to "self.use_exllama==False" * fix unittest * update gptqmodel version * backend is loading_attibutes (#7) * fix format and tests Signed-off-by: jiqing-feng <[email protected]> * fix memory check Signed-off-by: jiqing-feng <[email protected]> * fix device mismatch Signed-off-by: jiqing-feng <[email protected]> * fix result check Signed-off-by: jiqing-feng <[email protected]> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <[email protected]> * update tests Signed-off-by: jiqing-feng <[email protected]> * review: update docs (#10) * review: update docs (#12) * review: update docs * fix typo * update tests for gptqmodel Signed-off-by: jiqing-feng <[email protected]> * update document (#9) * update overview.md * cleanup * Update overview.md * Update overview.md * Update overview.md * update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md --------- Co-authored-by: Qubitium-ModelCloud <[email protected]> * typo * doc note for asymmetric quant * typo with apple silicon(e) * typo for marlin * column name revert: review * doc rocm support * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]> Co-authored-by: LRL-ModelCloud <[email protected]> Co-authored-by: ZX-ModelCloud <[email protected]> Co-authored-by: Qubitium-ModelCloud <[email protected]> Co-authored-by: ZX-ModelCloud <[email protected]> Co-authored-by: LRL <[email protected]> Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Mohamed Mekkouri <[email protected]> Co-authored-by: Steven Liu <[email protected]> * Fix : Nemotron Processor in GGUF conversion (#35708) * fixing nemotron processor * make style * Update docs/source/en/quantization/spqr.md Co-authored-by: Arthur <[email protected]> * Add missing TOC to doc --------- Signed-off-by: jiqing-feng <[email protected]> Co-authored-by: Steven Liu <[email protected]> Co-authored-by: jiqing-feng <[email protected]> Co-authored-by: LRL-ModelCloud <[email protected]> Co-authored-by: ZX-ModelCloud <[email protected]> Co-authored-by: Qubitium-ModelCloud <[email protected]> Co-authored-by: ZX-ModelCloud <[email protected]> Co-authored-by: LRL <[email protected]> Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Mohamed Mekkouri <[email protected]> Co-authored-by: Arthur <[email protected]>

…CollatorIntegrationTest::test_all_mask_replacement`: 1. I got the error `RuntimeError: "bernoulli_tensor_cpu_p_" not implemented for 'Long'`. This is because the `mask_replacement_prob=1` and `torch.bernoulli` doesn't accept this type (which would be a `torch.long` dtype instead. I fixed this by manually casting the probability arguments in the `__post_init__` function of `DataCollatorForLanguageModeling`. 2. I also got the error `tensorflow.python.framework.errors_impl.InvalidArgumentError: cannot compute Equal as input huggingface#1(zero-based) was expected to be a int64 tensor but is a int32 tensor [Op:Equal]` due to the line `tf.reduce_all((batch["input_ids"] == inputs) | (batch["input_ids"] == tokenizer.mask_token_id))` in `test_data_collator.py`. This occurs because the type of the `inputs` variable is `tf.int32`. Solved this by manually casting it to `tf.int64` in the test, as the expected return type of `batch["input_ids"]` is `tf.int64`.

…36457) Fixed 2 issues regarding `tests/trainer/test_data_collator.py::TFDataCollatorIntegrationTest::test_all_mask_replacement`: 1. I got the error `RuntimeError: "bernoulli_tensor_cpu_p_" not implemented for 'Long'`. This is because the `mask_replacement_prob=1` and `torch.bernoulli` doesn't accept this type (which would be a `torch.long` dtype instead. I fixed this by manually casting the probability arguments in the `__post_init__` function of `DataCollatorForLanguageModeling`. 2. I also got the error `tensorflow.python.framework.errors_impl.InvalidArgumentError: cannot compute Equal as input #1(zero-based) was expected to be a int64 tensor but is a int32 tensor [Op:Equal]` due to the line `tf.reduce_all((batch["input_ids"] == inputs) | (batch["input_ids"] == tokenizer.mask_token_id))` in `test_data_collator.py`. This occurs because the type of the `inputs` variable is `tf.int32`. Solved this by manually casting it to `tf.int64` in the test, as the expected return type of `batch["input_ids"]` is `tf.int64`.

Fix llama acc issue on gsm8k: update block_mask

* updated mistral3 model card (#1) * updated mistral3 model card * applying suggestions from code review Co-authored-by: Steven Liu <[email protected]> * made all changes to mistral3.md * adding space between paragraphs in docs/source/en/model_doc/mistral3.md Co-authored-by: Steven Liu <[email protected]> * removing duplicate in mistral3.md --------- Co-authored-by: Steven Liu <[email protected]> * adding 4 backticks to preserve formatting --------- Co-authored-by: Steven Liu <[email protected]>

sync main

SAM2 extra init changes

* updated mistral3 model card (huggingface#1) * updated mistral3 model card * applying suggestions from code review Co-authored-by: Steven Liu <[email protected]> * made all changes to mistral3.md * adding space between paragraphs in docs/source/en/model_doc/mistral3.md Co-authored-by: Steven Liu <[email protected]> * removing duplicate in mistral3.md --------- Co-authored-by: Steven Liu <[email protected]> * adding 4 backticks to preserve formatting --------- Co-authored-by: Steven Liu <[email protected]>

* Fix EXAONE-4.0 dummy id * Fix exaone4 dummy (#1) * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]> --------- Co-authored-by: Yih-Dar <[email protected]> Co-authored-by: ydshieh <[email protected]>

* Fix EXAONE-4.0 dummy id * Fix exaone4 dummy (huggingface#1) * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]> --------- Co-authored-by: Yih-Dar <[email protected]> Co-authored-by: ydshieh <[email protected]>

* Fix EXAONE-4.0 dummy id * Fix exaone4 dummy (#1) * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]> --------- Co-authored-by: Yih-Dar <[email protected]> Co-authored-by: ydshieh <[email protected]>

* Update expected values for one more `test_speculative_generation` after #40949 (#40967) fix Co-authored-by: ydshieh <[email protected]> * FIX(trainer): ensure final checkpoint is saved when resuming training (#40347) * fix(trainer): ensure final checkpoint is saved when resuming training * add test * make style && slight fix of test * make style again * move test code to test_trainer * remove outdated test file * Apply style fixes --------- Co-authored-by: rangehow <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Marc Sun <[email protected]> * Add new model LFM2-VL (#40624) * Add LFM2-VL support * add tests * linting, formatting, misc review changes * add siglip2 to auto config and instantiate it in lfm2-vl configuration * decouple image processor from processor * remove torch import from configuration * replace | with Optional * remove layer truncation from modeling file * fix copies * update everything * fix test case to use tiny model * update the test cases * fix finally the image processor and add slow tests * fixup * typo in docs * fix tests * the doc name uses underscore * address comments from Yoni * delete tests and unsuffling * relative import * do we really handle imports better now? * fix test * slow tests * found a bug in ordering + slow tests * fix copies * dont run compile test --------- Co-authored-by: Anna <[email protected]> Co-authored-by: Anna Banaszak <[email protected]> * Fix outdated version checks of accelerator (#40969) * Fix outdated version checks of accelerator Signed-off-by: Yuanyuan Chen <[email protected]> * Fix outdated version checks of accelerator Signed-off-by: Yuanyuan Chen <[email protected]> --------- Signed-off-by: Yuanyuan Chen <[email protected]> * Use `skip_predictor=True` in vjepa2 `get_vision_features` (#40966) use skip_predictor in vjepa2 `get_vision_features` * [Trainer] Fix DP loss (#40799) * fix * style * Fix fp16 * style --------- Co-authored-by: Matej Sirovatka <[email protected]> * [timm_wrapper] better handling of "Unknown model" exception in timm (#40951) * fix(timm): Add exception handling for unknown Gemma3n model * nit: Let’s cater to this specific issue * nit: Simplify error handling * Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate token (#40956) * fix merge conflicts * change token typing --------- Co-authored-by: Ubuntu <[email protected]> * [tests] Really use small models in all fast tests (#40945) * start * xcodec * chameleon * start * layoutlm2 * layoutlm * remove skip * oups * timm_wrapper * add default * doc * consistency * Add captured actual outputs to CI artifacts (#40965) * fix * fix * Remove `# TODO: ???` as it make me `???` * fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]> * Revert change in `compile_friendly_resize` (#40645) fix * Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981) * fix * fix --------- Co-authored-by: ydshieh <[email protected]> * Remove `set_model_tester_for_less_flaky_tests` (#40982) remove * Benchmarking v2 GH workflows (#40716) * WIP benchmark v2 workflow * Container was missing * Change to sandbox branch name * Wrong place for image name * Variable declarations * Remove references to file logging * Remove unnecessary step * Fix deps install * Syntax * Add workdir * Add upload feature * typo * No need for hf_transfer * Pass in runner * Runner config * Runner config * Runner config * Runner config * Runner config * mi325 caller * Name workflow runs properly * Copy-paste error * Add final repo IDs and schedule * Review comments * Remove wf params * Remove parametrization from worfkflow files * Fix callers * Change push trigger to pull_request + label * Add back schedule event * Push to the same dataset * Simplify parameter description * ENH: Enable readline support for transformers chat (#40911) ENH Enable readline support for chat This small change enables GNU readline support for the transformers chat command. This includes, among others: - advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f ctrl + k alt + d etc. - navigate and search history: arrow up/down ctrl + p ctrl + n ctrl + r - undo: ctrl + _ - clear screen: ctrl + l Implementation Although it may look strange, just importing readline is enough to enable it in Python, see: https://docs.python.org/3/library/functions.html#input As readline is not available on some platforms (https://docs.python.org/3/library/readline.html), the import is guarded. Readline should work on Linux, MacOS, and with WSL, I'm not sure about Windows though. Ideally, someone can give it a try. It's possible that Windows users would have to install pyreadline (https://pypi.org/project/pyreadline3/). * [testing] test `num_hidden_layers` being small in model tester (#40992) fix Co-authored-by: ydshieh <[email protected]> * blt wip (#38579) * blt wip * cpu version * cpu friendly with full entropy model (real time patching) * adding config file instead of args file * enable MPS * refactoring unused code * single config class in config file * inherit from PreTrainedModel * refactor LMTransformer --> BLTPatcher * add conversion script * load from new checkpoing with form_pretrained * fixed demo from_pretrained * clean up * clean a few comments * cleanup folder * clean up dir * cleaned up modeling further * rename classes * adding transformers Attention class and RotaryEmbedding class * exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc * seperate out patcher config, update modeling and conversion script * rename vars to be more transformers-like * rm unused functions * adding cross attention from transformers * pass arg * rename weights * updated conversion script * overwritten commit! fixing PR * apply feedback * adding BLTRMSNorm like Llama * add repeat_kv and eager_attention_forward copied from * BLTMLP identical to MllamTextMLP * clean up some args' * more like mllama, but busier inits * BLTTransformerLayer config * decoder, encoder, global configs * wip working on modular file * cleaning up patch and configs * clean up patcher helpers * clean up patcher helpers further * clean up * some config renaming * clean up unused configs * clean up configs * clean up configs * update modular * clean * update demo * config more like mllama, seperated subconfigs from subdicts * read from config instead of self args * update demo file * model weights to causal lm weights * missed file * added tied weights keys * BLTForCausalLM * adding files after add-new-model-like * update demo * working on tests * first running integration tests * added integration tests * adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff * tokenizer clean up * modular file * fixing rebase * ruff * adding correct basemodel output and updating config with checkpoint vals (for testing) * BLTModelTests git status * enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic * fix sdpa == causal tests * fix small model test and some gradient checkpointing * skip training GC tests * fix test * updated modular * update modular * ruff * adding modular + modeling * modular * more modern is_casual check * cleaning up modular * more modular reduction * ruff * modular fix * fix styling * return 2 * return 2 * fix some tests * fix bltcrossattention after modular break * some fixes / feedback * try cache generate fix * try cache generate fix * fix generate tests * attn_impl workaround * refactoring to use recent TransformersKwargs changes * fix hidden_states shape test * refactor to new outputs * simplify outputs a bit * rm unneeded decoderlayer overwriting * rename blt * forgot tokenizer test renamed * Reorder * Reorder * working on modular * updates from modular * new modular * ruff and such * update pretrainedmodel modular * using cohere2 apply_rotary_pos_emb * small changes * apply feedback r2 * fix cross_attention * apply more feedback * update modeling fix * load submodules from pretrainedmodel * set initializer_range to subconfigs * rm cross_attnetion_states pass when not needed * add 7b projection layer support * check repo * make copies * lost cohere2 rotate_half * ruff * copies? * don't tie weights for submodules * tie weights setting * check docstrings * apply feedback * rebase * rebased modeling * update docs * applying feedback * few more fixes * fix can_record_outputs * fast tokenizer * no more modulelist * tok auto * rm tokenizersss * fix docs * ruff * fix after rebase * fix test, configs are not subscriptable --------- Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Lysandre <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> * [`RMSNorm`] Fix rms norm init for models that center around 1 (#40796) * fix * fixup inits * oops * fixup gemma * fixup modular order * how does this keep happen lol * vaultgemma is new i forgot * remove init check * Make `EfficientLoFTRModelTest` faster (#41000) * fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]> * Fix typoes in src and tests (#40845) Signed-off-by: Yuanyuan Chen <[email protected]> * Fix more dates in model cards and wrong modalities in _toctree.yml (#40955) * Fix model cards and modalities in toctree * fix new models * RUFF fix on CI scripts (#40805) Signed-off-by: Yuanyuan Chen <[email protected]> * fix dict like init for ModelOutput (#41002) * fix dict like init * style * [tests] update `test_left_padding_compatibility` (and minimize overwrites) (#40980) * update test (and overwrites) * better test comment * 0 as a default for * Patch more `unittest.case.TestCase.assertXXX` methods (#41008) fix Co-authored-by: ydshieh <[email protected]> * 🚨 [lightglue] fix: matches order changed because of early stopped indices (#40859) * fix: bug that made early stop change order of matches * fix: applied code suggestion Co-authored-by: Pavel Iakubovskii <[email protected]> * fix: applied code suggestion to modular * fix: integration tests --------- Co-authored-by: Pavel Iakubovskii <[email protected]> * Fix `PhimoeIntegrationTest` (#41007) * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]> * Fix Glm4v test (#41011) fix * Update after #41007 (#41014) * fix * fix --------- Co-authored-by: ydshieh <[email protected]> * Fix benchmark runner argument name (#41012) * Adding support for Qwen3Omni (#41025) * Add Qwen3Omni * make fix-copies, import properly * nit * fix wrong setup. Why was audio_token_id renamed ? * upds * more processing fixes * yup * fix more generation tests * down to 1? * fix import issue * style, update check repo * up * fix quality at my best * final quality? * fix doc building * FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE * SKIP THE TEMPLATE ONE --------- Co-authored-by: lvyuanjun.lyj <[email protected]> Co-authored-by: Arthur <[email protected]> * Making compute_loss_func always take priority in Trainer (#40632) * logger warn, if-else logic improved * redundant if condition fix * Modify Qwen3Omni parameter name since VL changed it (#41045) Modify parameter name since VL changed it Co-authored-by: lvyuanjun.lyj <[email protected]> * Fix Qwen video tests (#41049) fix test * [testing] Fix `qwen2_audio` (#41018) * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]> * Fix typing of tuples (#41028) * Fix tuple typing Signed-off-by: Yuanyuan Chen <[email protected]> * More fixes Signed-off-by: Yuanyuan Chen <[email protected]> * More fixes Signed-off-by: Yuanyuan Chen <[email protected]> --------- Signed-off-by: Yuanyuan Chen <[email protected]> * Remove optax (#41030) Remove optax dep Signed-off-by: Yuanyuan Chen <[email protected]> * Fix typos in English/Chinese documentation (#41031) * Fix typos and formatting in English docs Signed-off-by: Yuanyuan Chen <[email protected]> * Fix typos and formatting in Chinese docs Signed-off-by: Yuanyuan Chen <[email protected]> --------- Signed-off-by: Yuanyuan Chen <[email protected]> * Use torch.autocast (#40975) * Use torch.autocast Signed-off-by: Yuanyuan Chen <[email protected]> * Format code Signed-off-by: Yuanyuan Chen <[email protected]> --------- Signed-off-by: Yuanyuan Chen <[email protected]> * docs: improved RoPE function Docstrings (#41004) * docs: improved RoPE functuon docstrings * Update src/transformers/modeling_rope_utils.py Co-authored-by: Joao Gante <[email protected]> --------- Co-authored-by: Joao Gante <[email protected]> * Fix condition for emitting warning when generation exceeds max model length (#40775) correct warning when generation exceeds max model length Signed-off-by: Yannick Schnider <[email protected]> * Fix outdated torch version check (#40925) Update torch minimum version check to 2.2 Signed-off-by: Yuanyuan Chen <[email protected]> * Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling (#39485) * Add whole word masking * Vectorize whole word masking functions * Unit test whole word masking * Remove support for TF in whole word masking * [testing] Fix `seed_oss` (#41052) * fix * fix * fix * fix * fix * fix * Update tests/models/seed_oss/test_modeling_seed_oss.py Co-authored-by: Anton Vlasjuk <[email protected]> * fix --------- Co-authored-by: ydshieh <[email protected]> Co-authored-by: Anton Vlasjuk <[email protected]> * Remove repeated import (#40937) * Remove repeated import Signed-off-by: Yuanyuan Chen <[email protected]> * Fix conflict Signed-off-by: Yuanyuan Chen <[email protected]> --------- Signed-off-by: Yuanyuan Chen <[email protected]> * Simplify unnecessary Optional typing (#40839) Remove Optional Signed-off-by: Yuanyuan Chen <[email protected]> * Add write token for uploading benchmark results to the Hub (#41047) * Separate write token for Hub upload * Address review comments * Address review comments * Ci utils (#40978) * Add CI reports dir to gitignore * Add utils to run local CI * Review compliance * Style * License * Fix CI jobs being all red 🔴 (false positive) (#41059) fix Co-authored-by: ydshieh <[email protected]> * Update quantization CI (#41068) * fix * new everything * fix * [i18n-bn] Add Bengali language README file (#40935) * [i18n-bn] Add Bengali language README file and update links in existing language files * Update Bengali README for clarity and consistency in model descriptions * Improve documentation and errors in Mamba2-based models (#41063) * fix bug in Mamba2 docs * correct 'because on of' issue * link to other Mamba2 model types * github URL is not changed * update error message in generated files * Update team member list for some CI workflows (#41094) * update list * update list --------- Co-authored-by: ydshieh <[email protected]> * fix crash when using chat to send 2+ request to gptoss (#40536) Signed-off-by: Wang, Yi <[email protected]> * Minor addition, no split modules for VideoMAEE (#41051) * added no split modules * fixed typo --------- Co-authored-by: Raushan Turganbay <[email protected]> * Switch to `python:3.10-slim` for CircleCI docker images (#41067) fix Co-authored-by: ydshieh <[email protected]> * Fix argument name in benchmarking script (#41086) * Fix argument name in benchmarking script * Adjust vars * Fix typos in documentation (#41087) Signed-off-by: Yuanyuan Chen <[email protected]> * Fix typing (#40788) * Fix optional typing Signed-off-by: Yuanyuan Chen <[email protected]> * Fix optional typing Signed-off-by: Yuanyuan Chen <[email protected]> * Fix schema typing Signed-off-by: Yuanyuan Chen <[email protected]> * Fix typing * Fix typing * Fix typing * Fix typing * Use np.ndarray Signed-off-by: Yuanyuan Chen <[email protected]> * Fix typing Signed-off-by: Yuanyuan Chen <[email protected]> * Format code Signed-off-by: Yuanyuan Chen <[email protected]> * Use np.ndarray Signed-off-by: Yuanyuan Chen <[email protected]> * Improve typing Signed-off-by: Yuanyuan Chen <[email protected]> * Fix quote string of np.ndarray Signed-off-by: Yuanyuan Chen <[email protected]> * More fixes Signed-off-by: Yuanyuan Chen <[email protected]> * Fix code * Format Signed-off-by: Yuanyuan Chen <[email protected]> --------- Signed-off-by: Yuanyuan Chen <[email protected]> * Remove unused arguments (#40916) * Fix unused arguments Signed-off-by: Yuanyuan Chen <[email protected]> * More fixes Signed-off-by: Yuanyuan Chen <[email protected]> --------- Signed-off-by: Yuanyuan Chen <[email protected]> * fix wrong height and width when read video use torchvision (#41091) * docs: Fix Tool Use links and remove dead RAG links (#41104) docs: Fix tool use links. Remove dead RAG links. Fix style * [tests] gpt2 + `CausalLMModelTester` (#41003) * tmp commit * tmp commit * tmp commit * rm old GPT2ModelTester * nit bug * add facilities for encoder-decoder tests; add comments on ALL overwrites/extra fns * vision_encoder_decoder * Fix `_get_test_info` for inherited tests (#41106) * fix _get_test_info * fix patched * add comment * ruff --------- Co-authored-by: ydshieh <[email protected]> * Remove bad test skips (#41109) * remove bad skips * remove more * fix inits * Format empty lines and white space in markdown files. (#41100) * Remove additional white space and empty lines from markdown files Signed-off-by: Yuanyuan Chen <[email protected]> * Add empty lines around code Signed-off-by: Yuanyuan Chen <[email protected]> --------- Signed-off-by: Yuanyuan Chen <[email protected]> * Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809) Update ruff to 0.13.1 target it to Python 3.10 and apply its fixes Signed-off-by: Yuanyuan Chen <[email protected]> Co-authored-by: Yih-Dar <[email protected]> * Support loading LFM2 GGUF (#41111) * add gguf config mapping for lfm2 * add lfm2 tensor process to unsqueeze conv weights * adjust values from gguf config to HF config * add test for lfm2 gguf * ruff --------- Co-authored-by: Marc Sun <[email protected]> * [torchao safetensors] integrate torchao safetensors support with transformers (#40735) * enable torchao safetensors * enable torchao safetensors support * add more version checking * [Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule and torch_recurrent_gated_delta_rule (#40963) (#41036) * fix mismatched dims for qwen3 next * propagate changes * chore: renamed tot_heads to total_sequence_length * Apply suggestion from @vasqu Co-authored-by: Anton Vlasjuk <[email protected]> * minor fix to modular qwen3 next file --------- Co-authored-by: Anton Vlasjuk <[email protected]> * Fix the error where a keyword argument appearing before *args (#41099) Signed-off-by: Yuanyuan Chen <[email protected]> * Fix broken `` expressions in markdown files (#41113) Fix broken expressions in markdown files Signed-off-by: Yuanyuan Chen <[email protected]> * Remove self-assignment (#41062) * Remove self-assignment Signed-off-by: Yuanyuan Chen <[email protected]> * Update src/transformers/integrations/flash_paged.py Co-authored-by: Matt <[email protected]> * Clear pass Signed-off-by: Yuanyuan Chen <[email protected]> * Clear pass Signed-off-by: Yuanyuan Chen <[email protected]> * Clear pass Signed-off-by: Yuanyuan Chen <[email protected]> --------- Signed-off-by: Yuanyuan Chen <[email protected]> Co-authored-by: Matt <[email protected]> * Fixed MXFP4 model storage issue (#41118) * Fixed loading LongT5 from legacy checkpoints (#40724) * Fixed loading LongT5 from legacy checkpoints * Adapted the fix to work with missing lm_head * dummy commit (#41133) * dummy commit, nothing interesting * dummy commit, nothing interesting * dummy commit, nothing interesting * dummy commit, nothing interesting --------- Co-authored-by: ydshieh <[email protected]> * Fix loading logic flaw with regards to unexpected and missing keys (#40850) * Unexpected keys should be ignored at load with device map * remove them all * fix logic flaw * fix * simplify * style * fix * revert caching allocator change * add other test * add nice doc --------- Co-authored-by: Cyril Vallez <[email protected]> * Fix: align Qwen2.5-VL inference rope index with training by passing s… (#41153) Fix: align Qwen2.5-VL inference rope index with training by passing second_per_grid_ts * Fix single quotes in markdown (#41154) Fix typos Signed-off-by: Yuanyuan Chen <[email protected]> * extend gemma3n integration ut cases on XPU (#41071) Signed-off-by: Yao, Matrix <[email protected]> * Add Parakeet (#39062) * first commit Signed-off-by: nithinraok <[email protected]> * update to handle masking for bs>1 Signed-off-by: nithinraok <[email protected]> * Add tests and docs Signed-off-by: nithinraok <[email protected]> * update model ids Signed-off-by: nithinraok <[email protected]> * update docs and improve style Signed-off-by: nithinraok <[email protected]> * update librosa location Signed-off-by: nithinraok <[email protected]> * import guard torch too Signed-off-by: nithinraok <[email protected]> * ruff code checks fix Signed-off-by: nithinraok <[email protected]> * ruff format check Signed-off-by: nithinraok <[email protected]> * updated to parakeet names Signed-off-by: nithinraok <[email protected]> * update script Signed-off-by: nithinraok <[email protected]> * Add tokenizer decoding Signed-off-by: nithinraok <[email protected]> * Remove other model dependency Signed-off-by: nithinraok <[email protected]> * clean tests Signed-off-by: nithinraok <[email protected]> * fix tests Signed-off-by: nithinraok <[email protected]> * linting Signed-off-by: nithinraok <[email protected]> * fix ruff lint warnings Signed-off-by: nithinraok <[email protected]> * move to seperate folders Signed-off-by: nithinraok <[email protected]> * add parakeet ctc model code Signed-off-by: nithinraok <[email protected]> * simplify encoder structure Signed-off-by: nithinraok <[email protected]> * update documentation Signed-off-by: nithinraok <[email protected]> * add parakeet to toctree Signed-off-by: nithinraok <[email protected]> * fix tests Signed-off-by: nithinraok <[email protected]> * add parakeet doc Signed-off-by: nithinraok <[email protected]> * Address comments Signed-off-by: nithinraok <[email protected]> * Update featurizer to compute lens directly Signed-off-by: nithinraok <[email protected]> * fix ruff tests Signed-off-by: nithinraok <[email protected]> * fix encoding format Signed-off-by: nithinraok <[email protected]> * fix minor ctc decoding Signed-off-by: nithinraok <[email protected]> * revert modular_model_converter.py changes * revert check_config_attributes.py changes * refactor: fastconformer & parakeet_ctc -> parakeet * modeling update * test update * propagate feature extractor updates * propagate doc changes * propagate doc changes * propagate tokenization changes * propagate conversion changes * remove fastconformer tests * remove modular * update processor * update processor * tset update * diverse fixes * 100% macthing greedy batched * Update conversion script. * Refactor docs. * Reafactor auto loading. * Refactor and fix tokenization and processing. * Update integration test. * Modeling fixes: - ensure correct attention mask shape - ensure layer drop returns valid output - correct blank token ID when computing CTC loss * Format and repo consistency. * Update model doc. * Fix feature extraction tests. * Fix (most) tokenizer tests. * Add pipeline example. * Fixes * Use eager_attention_forward from Llama. * Small tweaks. * Replace Sequential with ModuleList * Add check if not all layers copied * Clean tokenizer. * Standardize FastSpeech2ConformerConvolutionModule for Parakeet. * Switch to modular for modeling and processing. * Add processor tests. * Fix modeling tests. * Formating and docstrings. * Add `return_attention_mask` like other feature extractors. * clean up after merging main. * nits on modeling * configuration update * nit * simplification: use PretrainedTokenizerFast, simplify processor * add dtype arg to mel_filter_bank * feature extraction: simplify! * modeling update * change to ParakeetTokenizerFast * correct attention mask handling * auto update * proc update * test update * feature extraction fixes * modeling update * conversion script update * udpate tests feature integration * update tokenization and tests * processor tests * revert audio_utils * config docstring update * blank_token -> pad_token * modeling udpate * doc update * fix tests * fix test * fix tests * address review comments * add comment * add comment * explicitly not support flash * atttention straightforward masking * fix * tokenizer update: skipping blank tokens by default * doc update * fix max_positions_embeddings handling * nits * change atol faeture extraction integration tests * doc update + fix loss * doc update * nit * update integration test for A10 * repo id name * nit --------- Signed-off-by: nithinraok <[email protected]> Co-authored-by: Eustache Le Bihan <[email protected]> Co-authored-by: eustlb <[email protected]> Co-authored-by: Eric B <[email protected]> * Fix format of compressed_tensors.md (#41155) * Fix table format Signed-off-by: Yuanyuan Chen <[email protected]> * Fix format Signed-off-by: Yuanyuan Chen <[email protected]> --------- Signed-off-by: Yuanyuan Chen <[email protected]> * Simplify and improve model loading logic (#41103) * remove unexpected keys from inputs (they have nothing to do there) * remove input * simplify a lot init * fix * fix check for non-persistent buffer * revert because too many old and bad models... * remove comment * type hint * make it a real test * remove model_to_load -> always use the same model * typo * remove legacy offload_folder (we never waste that memory anymore) * do not change prefix anymore * change very bad function name * create adjust method * remove useless method * restrict * BC * remove unused method * CI * remove unused args * small fix * fix * CI * CI * avoid too many loops * fix regex * cleaner * typo * fix * fix * Force new vision models addition to include a fast image processor (#40802) * add test * fix test and change cutoff date * Add documentation to test * Add language specifiers to code blocks of markdown files (#41114) * Add language specifiers to code blocks of markdown files Signed-off-by: Yuanyuan Chen <[email protected]> * Update docs/source/en/model_doc/qwen3_omni_moe.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/chat_templating_writing.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/chat_templating_writing.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/chat_templating_writing.md Co-authored-by: Steven Liu <[email protected]> * More fixes Signed-off-by: Yuanyuan Chen <[email protected]> * Update nemotron.md Co-authored-by: Steven Liu <[email protected]> * Update phimoe.md Co-authored-by: Steven Liu <[email protected]> * Update README.md Co-authored-by: Steven Liu <[email protected]> * Fix syntax error Signed-off-by: Yuanyuan Chen <[email protected]> --------- Signed-off-by: Yuanyuan Chen <[email protected]> Co-authored-by: Steven Liu <[email protected]> * Improve `add_dates` script (#41167) * utils/add_dates.py * put lfm2-vl in correct category * Fix flash-attn for paged_attention when no kernels (#41078) * Fix non-kernels flash attention paged implementation * Cover all cases * Style * Update src/transformers/integrations/flash_paged.py Co-authored-by: Mohamed Mekkouri <[email protected]> * Apply style fixes --------- Co-authored-by: Mohamed Mekkouri <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Remove data from examples (#41168) Remove telemetry * Enable fa in amd docker (#41069) * Add FA to docker * Use caching mechanism for qwen2_5 * Fix a typo in important models list * Partial fixes for gemma3 * Added a commit ID for FA repo * Detailled the expectation storage format * Rebase fix * Apply style fixes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * handle flash slow tests (#41072) * handle flash slow tests * update patch mask to 1/0 for flash * don't skip flash * flash * raise tols * rm flash support :( * nits --------- Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> * Modernbert fix (#41056) * Add FA to docker * Fixed padding for mdernbert * Fixed logits and hidden states extraction in ModernBertForMultipleChoice * Added a test for ModernBertForMultipleChoice * fixes * More fixes and GREEN CI * consistency * moar consistency * CI Runners - move amd runners mi355 and 325 to runner group (#41193) * Update CI workflows to use devmi355 branch * Add workflow trigger for AMD scheduled CI caller * Remove unnecessary blank line in workflow YAML * Add trigger for workflow_run on main branch * Update workflow references from devmi355 to main * Change runner_scale_set to runner_group in CI config * [XPU] Add MXFP4 support for XPU (#41117) * XPU supports gpt-oss MXFP4 * Complete MXFP4 UT file and comment information * Complete MXFP4 UT file and comment information * Fix code style * Fix code style --------- Co-authored-by: Marc Sun <[email protected]> * [tests] `CausalLMTester` automatically infers other test classes from `base_model_class` 🐛 🔫 (#41066) * halfway through the models * update test checks * refactor all * another one * use tuples * more deletions * solve bad inheritance patterns * type * PR ready? * automatic model class inference from the base class * vaultgemma * make fixup * make fixup * rebase with gpt2 * make fixup :'( * gpt2 is special * More typing fixes (#41102) * Fix noqa Signed-off-by: Yuanyuan Chen <[email protected]> * fix typing Signed-off-by: Yuanyuan Chen <[email protected]> * Use np.ndarray Signed-off-by: Yuanyuan Chen <[email protected]> * More fixes Signed-off-by: Yuanyuan Chen <[email protected]> * remove noqa Signed-off-by: Yuanyuan Chen <[email protected]> * Fix chars Signed-off-by: Yuanyuan Chen <[email protected]> * More fixes Signed-off-by: Yuanyuan Chen <[email protected]> * Fix Signed-off-by: Yuanyuan Chen <[email protected]> --------- Signed-off-by: Yuanyuan Chen <[email protected]> * enable flex attention ut cases on XPU (#40989) * enable flex attention ut cases on XPU Signed-off-by: Yao, Matrix <[email protected]> * fix style Signed-off-by: Yao, Matrix <[email protected]> --------- Signed-off-by: Yao, Matrix <[email protected]> Co-authored-by: Marc Sun <[email protected]> * fix(trainer): Avoid moving model with device_map (#41032) * fix(trainer): Avoid moving model with device_map When a model is loaded with `device_map="auto"` and is too large to fit on a single GPU, `accelerate` will offload some layers to the CPU or disk. The `Trainer` would previously attempt to move the entire model to the specified device, causing a `RuntimeError` because a model dispatched with `accelerate` hooks cannot be moved. This commit fixes the issue by adding a check in `_move_model_to_device` to see if the model has an `hf_device_map` attribute. If it does, the device placement is assumed to be handled by `accelerate`, and the `model.to(device)` call is skipped. A regression test is added to ensure the `Trainer` can be initialized with a model that has a `hf_device_map` that simulates offloading without raising an error. * Added the logger warning for the move model --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> * Fix attention sink implementation in flex attention (#41083) * Fix attention sink implementation in flex attention * fix dim * fix * Remove print * raisae error when return_lse is False yet s_aux is providewd * Clean test files for merge * Update src/transformers/integrations/flex_attention.py Co-authored-by: Arthur <[email protected]> * force return lse * Add to doc --------- Co-authored-by: Arthur <[email protected]> * Separate docker images for Nvidia and AMD in benchmarking (#41119) Separate docker images for Nvidia and AMD * Make quantizers good citizens loading-wise (#41138) * fix param_needs_quantization * rewrite most hqq * clean * fix * comment * remove it from exception of safetensors * start on bnb 4bits * post-rebase fix * make bnb4 bit a good citizen * remove forgotten print * make bnb 8bits a good citizen * better hqq * fix * clean * remove state dict from signature * switch method * make torchao a good citizen * fixes * fix torchao * add check * typo * [`Kernels Attention`] Change fallback logic to error out on explicit kernels request and include FA3 (#41010) * fix * be more strict * change logic to include fa3 * fix the case where nothing is requested * modify old tests + add kernels related tests * style * Add EdgeTAM (#39800) * initial comment * test * initial conversion for outline * intermediate commit for configuration * chore:init files for sam2 * adding arbitary undefined config * check * add vision * make style * init sam2 base model * Fix imports * Linting * chore:sam to sam2 classes * Linting * Add sam2 to models.__init__ * chore:match prompt encoder with sam2 code * chore:prepare kwargs for mask decoder * Add image/video predictors * Add CUDA kernel * Add output classes * linting * Add logging info * tmp commit * docs for sam2 * enable image processing * check difference of original SAM2 - difference is the order of ToTensor() - please see https://pytorch.org/vision/main/_modules/torchvision/transforms/functional.html#resize * enable promptencoder of sam2 * fix promprencoder * Confirmed that PromptEncoder is exactly same (Be aware of bfloat16 and float32 difference) * Confirmed that ImageEncoder is exactly same (Be aware the linting of init) * Confirmed that MaskDecoder is exactly same (TO DO: lint variable name) * SamModel is now available (Need more chore for name) * make fix-copies * make style * make CI happy * Refactor VisionEncoder and PostioinEmbedding * TO DO : fix the image_embeddings and sparse_embeddings part * pure image inference done * reusable features fix and make style * styling * refactor memoryattention * tmp * tmp * refactor memoryencoder TO DO : convert and inference the video pipeline * TO DO : fix the image_encoder shape * conversion finish TO DO: need to check video inference * make style * remove video model * lint * change * python utils/check_docstringspy --check_all * python utils/check_config_attributes.py * remove copies for sam2promptencoder due to configuration * change __init__.py * remove tensorflow version * fix that to not use direct comparison * make style * add missing import * fix image_embedding_size * refactor Sam2 Attention * add fully working video inference (refactoring todo) * clarify _prepare_memory_conditioned_features * simplify modeling code, remove unused paths * use one model * use auto_docstring * refactor rope embeddings * nit * not using multimask when several points given * add all sam2.1 * add video tmp * add Sam2VideoSessionState + fast image proc + video proc * remove init_states from model * fix batch inference * add image integration tests * uniformize modeling code with other sam models and use modular * pass vision tests an most model tests * All tests passing * add offloading inference state and video to cpu * fix inference from image embedding and existing mask * fix multi_boxes mask inference * Fix batch images + batch boxes inference * improve processing for image inference * add support for mask generation pipeline * add support for get_connected_components post processing in mask generation * add fast image processor sam, image processor tests and use modular for sam2 image processor * fix mistake in sam after #39120 * fix init weights * refactor convert * add integration tests for video + other improvements * add needed missing docstrings * Improve docstrings and * improve inference speed by avoiding cuda sync * add test * skip test for vision_model * minor fix for vision_model * fix vision_model by adding sam2model and change the torch dependencies * remove patch_size * remove image_embedding_size * fix patch_size * fix test * make style * Separate hieradet and vision encoder in sam2 * fixup * review changes part 1 * remove MemoryEncoderConfig and MemoryAttentionConfig * pass q_stride instead of q_pool module * add inference on streamed videos * explicitely process streamed frames * nit * Improve docstrings in Sam2Model * update sam2 modeling with better gestion of inference state and cache, and separate Sam2Model and Sam2VideoModel * improve video inference api * change inference_state to inference_session * use modular for Sam2Model * fix convert sam2 hf * modular * Update src/transformers/models/sam2/video_processing_sam2.py Co-authored-by: Pavel Iakubovskii <[email protected]> * fix minor config * fix attention loading error * update modeling tests to use hub checkpoints * Use CI A10 runner for integration tests values + higher tolerance for video integration tests * PR review part 1 * fix doc * nit improvements * enforce one input format for points, labels and boxes * nit * last few nits from PR review * fix style * fix the input type * fix docs * add sam2 model as conversion script * improve sam2 doc * add rough necessarry changes * first working edgetam * fix issue with object pointers * Use modular as much as possible * nit fixes + optimization * refactor spatial perceiver * cleanup after merge * add working edgetam * improve perceiver resampler code * simplify/unify rope attention logic * Improve comments in apply_rotary_pos_emb_2d * add working tests * fix test timmwrapper * add docs * make fixup * nits * fix modular * fix modular * PR review part 1 * split apply_rotary_pos_emb_2d * add granularity to _prepare_memory_conditioned_features * add dates to doc * add separate mlp for memory attention * Fix memory on wrong device * store processed frames in dict * update checkpoints in tests * update dates --------- Co-authored-by: sangbumchoi <[email protected]> Co-authored-by: RUFFY-369 <[email protected]> Co-authored-by: Sangbum Daniel Choi <[email protected]> Co-authored-by: Haitham Khedr <[email protected]> Co-authored-by: sangbum choi <[email protected]> Co-authored-by: Pavel Iakubovskii <[email protected]> * Fix EXAONE-4.0 dummy id (#41089) * Fix EXAONE-4.0 dummy id * Fix exaone4 dummy (#1) * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]> --------- Co-authored-by: Yih-Dar <[email protected]> Co-authored-by: ydshieh <[email protected]> * Fix 8bit bnb loading (#41200) * Fix 8bit * oups forgot the case where it is not prequantized * Fix docker quantization (#41201) * launch docker * remove gptq for now * run tests * Revert "run tests" This reverts commit f85718ce3a21d5937bf7405b8925c125c67d1a3e. * revert * Embed interactive timeline in docs (#41015) * embed timeline in docs (test web componentand Iframe) * test scaling * test multiple scales * compensate scale in width * set correct syle and scale * remove bottom space created by scale * add timeline as a separate page * reformulate docs after review * [docs] Fix links (#41110) fix * Remove unnecessary Optional typing (#41198) Signed-off-by: Yuanyuan Chen <[email protected]> * docs/examples(speech): pin CTC commands to Hub datasets; add Windows notes (#41027) * examples(speech): load Common Voice from Hub; remove deprecated dataset-script references (Windows-friendly notes) * docs/examples(speech): pin CTC streaming & other CTC commands to Hub datasets; add Windows notes * make style * examples(speech): align DataTrainingArguments help with datasets docs; minor wording fixes * docs/examples(speech): address review remove Hub subsection & Whisper tip; align dataset help text * style: apply ruff/black/usort/codespell on examples/speech-recognition * Apply style fixes * Update examples/pytorch/speech-recognition/README.md * update doc to match load_dataset --------- Co-authored-by: Eustache Le Bihan <[email protected]> Co-authored-by: eustlb <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Fix Qwen3-Omni audio_token_id serialization issue (#41192) Fix Qwen3-Omni audio_token_id serialization by overriding parent's attribute_map - Override attribute_map in Qwen3OmniMoeThinkerConfig to prevent inheritance of incorrect mapping - Parent class maps audio_token_id -> audio_token_index, but implementation uses audio_token_id directly - Fixes issue where custom audio_token_id values were not preserved during save_pretrained/from_pretrained cycles Fixes #41191 * Wait for main process in _save_checkpoint to ensure best checkpoint exists (#40923) * Update trainer.py * fix * fix format * move barrier, delete redundant * Avoid assumption that model has config attribute in deepspeed (#41207) Avoid assumption that model has config in deepspeed * Trainer: Pass `num_items_in_batch` to `compute_loss` in `prediction_step` (#41183) * Add num_items_in_batch computation to predict_step. * address comments. * Fix test cases. * fixup --------- Co-authored-by: Marc Sun <[email protected]> * [ESM] add accepts_loss_kwargs=False to EsmPreTrainedModel (#41006) add accepts_loss_kwargs=False to EsmPreTrainedModel Signed-off-by: Peter St. John <[email protected]> Co-authored-by: Marc Sun <[email protected]> * Align pull request template to bug report template (#41220) The only difference is that I don't users to https://discuss.huggingface.co/ for hub issues. * [generate] cache missing custom generate file (#41216) * cache missing custom generate file * make fixup * Remove old Python code (#41226) Signed-off-by: Yuanyuan Chen <[email protected]> * Adapt to the SDPA interface to enable the NPU to call FlashAttentionScore (#41143) Adapt to the SDPA interface to enable the NPU to call FlashAttentionScore. Co-authored-by: frozenleaves <[email protected]> * update code owners (#41221) Co-authored-by: ydshieh <[email protected]> * Unify is_torchvision_v2_available with is_torchvision_available (#41227) Signed-off-by: Yuanyuan Chen <[email protected]> * Fix typing of train_args (#41142) * Fix typing Signed-off-by: Yuanyuan Chen <[email protected]> * Fix fsdp typing Signed-off-by: Yuanyuan Chen <[email protected]> --------- Signed-off-by: Yuanyuan Chen <[email protected]> * Fix sliding window attn mask (#41228) * Fix sliding window attn mask * Clearer test * Apply style fixes * If Picasso made ascii drawings he would have made this --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Revert "Fix DeepSpeed mixed precision precedence over Accelerate defaults" (#41124) * Revert "Fix DeepSpeed mixed precision precedence over Accelerate defaults (#3…" This reverts commit df67cd35f0ca1a1cbf7147b2576db31b16200cf4. * fix * [docs] Fix tp_plan (#41205) remove manual * Fix white space in documentation (#41157) * Fix white space Signed-off-by: Yuanyuan Chen <[email protected]> * Revert changes Signed-off-by: Yuanyuan Chen <[email protected]> * Fix autodoc Signed-off-by: Yuanyuan Chen <[email protected]> --------- Signed-off-by: Yuanyuan Chen <[email protected]> * fix qwen text config (#41158) * fix qwen text config * fix tests * fix one more test * address comments * Video processor accepts single frames on cuda (#41218) * fix * why was is np if input is in torch * Use math.log2 (#41241) Signed-off-by: Yuanyuan Chen <[email protected]> * fix TrainerIntegrationDeepSpeed UT failures (#41236) Signed-off-by: Yao, Matrix <[email protected]> * [repo utils] Update `models_to_deprecate.py` (#41231) * update models_to_deprecate * exclude this file * handle typos and aliases * don't commit files * PR suggestions; make fixup * Use removeprefix and removesuffix (#41240) * Use removeprefix and removesuffix Signed-off-by: Yuanyuan Chen <[email protected]> * More fixes Signed-off-by: Yuanyuan Chen <[email protected]> --------- Signed-off-by: Yuanyuan Chen <[email protected]> * Fix pylint warnings (#41222) * Remove unused variables Signed-off-by: Yuanyuan Chen <[email protected]> * Remove reimported packages Signed-off-by: Yuanyuan Chen <[email protected]> * Fix code Signed-off-by: Yuanyuan Chen <[email protected]> * Fix pylint warnings Signed-off-by: Yuanyuan Chen <[email protected]> * Simplify Signed-off-by: Yuanyuan Chen <[email protected]> --------- Signed-off-by: Yuanyuan Chen <[email protected]> * Remove all instances of `is_safetensors_available` (#41233) * safetensors is a core dep * fix * ok * simplify branching * keep it for now --------- Co-authored-by: Cyril Vallez <[email protected]> * FP-Quant NVFP4 and Python 3.9 support (#39876) * quartet * quartet qat -> quartet * format * bf16 backward * interfaces * forward_method * quartet -> fp_quant * style * List -> list * list typing * fixed format and annotations * test_fp_quant * docstrings and default dtypes * better docstring and removed noop checks * docs * pseudoquantization support to test on non-blackwell * pseudoquant * Pseudoquant docs * Update docs/source/en/quantization/fp_quant.md Co-authored-by: Marc Sun <[email protected]> * Update docs/source/en/quantization/fp_quant.md * Update docs/source/en/quantization/fp_quant.md * Update src/transformers/utils/quantization_config.py Co-authored-by: Mohamed Mekkouri <[email protected]> * Update tests/quantization/fp_quant_integration/test_fp_quant.py Co-authored-by: Mohamed Mekkouri <[email protected]> * Update tests/quantization/fp_quant_integration/test_fp_quant.py Co-authored-by: Marc Sun <[email protected]> * small test fixes * dockerfile update * spec link * removed `_process_model_after_weight_loading` * toctree * nvfp4 * nvfp4 tests * FP-Quant version bumped * nvfp4 default and docs update * trainable * cpu if pseudoquant * proper group size selection * gsr * qutlass requirement version bumo * Upstream docker copy * docs update --------- Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Mohamed Mekkouri <[email protected]> * [`FA3`] Fix masking and loading logic in same process (#41217) fix loading and fa3 masking * [t5gemma] fix `get_text_config` and related fixes (#40939) * tmp commit * t5gemma fixes * Don't convert to `safetensors` on the fly if the call is from testing (#41194) * don't convert * disable * Update src/transformers/modeling_utils.py Co-authored-by: Cyril Vallez <[email protected]> * fix * disable * disable * disable --------- Co-authored-by: ydshieh <[email protected]> Co-authored-by: Cyril Vallez <[email protected]> * Resolve remote custom module path warnings (#41243) * add peft team members to issue/pr template (#41262) * add * Update .github/PULL_REQUEST_TEMPLATE.md Co-authored-by: Benjamin Bossan <[email protected]> --------- Co-authored-by: ydshieh <[email protected]> Co-authored-by: Benjamin Bossan <[email protected]> * docs: update bitsandbytes platform support (#41266) * add more activation kernels, follow up (#40944) * add more activation kernels * fixing style * fix version * fix asr pipeline ut failures (#41275) * fix asr pipeline ut failures Signed-off-by: Yao, Matrix <[email protected]> * make style Signed-off-by: Yao, Matrix <[email protected]> --------- Signed-off-by: Yao, Matrix <[email protected]> * Use regex defailed flags (#41264) Signed-off-by: Yuanyuan Chen <[email protected]> * Fix multi-video timestamp bug in Qwen-3-VL and GLM4V (#41229) * fix multi-video timestamp bug in qwen3vl,glm4v * run make fix-copies to sync modular files * run make fix-copies to sync modular files --------- Co-authored-by: UBT <[email protected]> * Fix binding of video frames to video placeholder in `InternVL` model (#41237) * Fix binding video frames to video placeholder in prompt Signed-off-by: Daniel Bershatsky <[email protected]> * Add test on binding video frames to prompt Signed-off-by: Daniel Bershatsky <[email protected]> * Fix code style issues Signed-off-by: Daniel Bershatsky <[email protected]> * Fix broken tests on `InternVLProcessor` Signed-off-by: Daniel Bershatsky <[email protected]> * Add `return_tensors` to video processor defaults Signed-off-by: Daniel Bershatsky <[email protected]> --------- Signed-off-by: Daniel Bershatsky <[email protected]> * Deprecate Trackio environment variables and deploy to Spaces by default (#40950) * allow prive space id for trackio * complete docstring * Deprecate environment variables for Trackio integration; use TrainingArguments instead and deploy by default * style * Enhance documentation for Trackio Space ID in TrainingArguments * Allow private Space id for Trackio (#40948) * allow prive space id for trackio * complete docstring * fix async client for transformers chat (#41255) * fix-client * fix * Unify is_torchvision_v2_available with is_torchvision_available (#41259) Fix is_torchvision_v2_available Signed-off-by: Yuanyuan Chen <[email protected]> * Use max/min (#41280) Signed-off-by: Yuanyuan Chen <[email protected]> * Biogptlogits (#41270) added logits slicing to BioGpt for seq classifier Signed-off-by: Aviral <[email protected]> * Fix unnecessary single-item container checks (#41279) Signed-off-by: Yuanyuan Chen <[email protected]> * Fix pylint generator warnings (#41258) Fix pylint generator warnings Signed-off-by: cyy <[email protected]> * feat: use `aws-highcpu-32-priv` for amd docker img build (#41285) * feat: use `aws-highcpu-32-priv` for amd docker img build * feat: add `workflow_dispatch` event to docker build CI * Add processor and intergration test for qwen3vl (#41277) * support aux loss in qwen3vlmoe * update qwen3vl processor test! * add integration tests for qwen3vl-30a3 * remove duplicated decorator * code clean * fix consistency * do not inherit from nn.Linear for better quantization * pass check * Remove `test_initialization` (#41261) remove it * Remove some previous team members from allow list of triggering Github Actions (#41263) * delete * delete --------- Co-authored-by: ydshieh <[email protected]> * Build doc in 2 jobs: `en` and `other languages` (#41290) * separate * separate --------- Co-authored-by: ydshieh <[email protected]> * Fix mxfp4 dequantization (#41292) fix * [`Flex Attn`] Fix lse x attention sinks logic (#41249) fix * FIX: Bug in PEFT integration delete_adapter method (#41252) The main content of this PR is to fix a bug in the delete_adapter method of the PeftAdapterMixin. Previously, it did not take into account auxiliary modules from PEFT, e.g. those added by modules_to_save. This PR fixes this oversight. Note that the PR uses a new functionality from PEFT that exposes integration functions like delete_adapter. Those will be contained in the next PEFT release, 0.18.0 (yet unreleased). Therefore, the bug is only fixed when users have a PEFT version fullfilling this requirement. I ensured that with old PEFT versions, the integration still works the same as previously. The newly added test for this is skipped if the PEFT version is too low. (Note: I tested locally with that the test will pass with PEFT 0.18.0) While working on this, I also cleaned up the following: - The active_adapter property has been deprecated for more than 2 years (#26407). It is safe to remove it now. - There were numerous small errors or outdated pieces of information in the docstrings, which have been addressed. When PEFT < 0.18.0 is used, although we cannot delete modules_to_save, we can still detect them and warn about it. * Italian translation for README.md (#41269) chore: add Italian translation for README.md * Fix README.md error when installing from source (#41303) * download and use HF Hub Cache (#41181) use hub cache Co-authored-by: ydshieh <[email protected]> * fix some merge issues * [test_all] * [test-all] --------- Signed-off-by: Yuanyuan Chen <[email protected]> Signed-off-by: Yannick Schnider <[email protected]> Signed-off-by: Wang, Yi <[email protected]> Signed-off-by: Yao, Matrix <[email protected]> Signed-off-by: nithinraok <[email protected]> Signed-off-by: Peter St. John <[email protected]> Signed-off-by: Daniel Bershatsky <[email protected]> Signed-off-by: Aviral <[email protected]> Signed-off-by: cyy <[email protected]> Co-authored-by: Yih-Dar <[email protected]> Co-authored-by: ydshieh <[email protected]> Co-authored-by: Rangehow <[email protected]> Co-authored-by: rangehow <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Raushan Turganbay <[email protected]> Co-authored-by: Anna <[email protected]> Co-authored-by: Anna Banaszak <[email protected]> Co-authored-by: Yuanyuan Chen <[email protected]> Co-authored-by: Hamish Scott <[email protected]> Co-authored-by: Matej Sirovatka <[email protected]> Co-authored-by: Harshal Janjani <[email protected]> Co-authored-by: Branden <[email protected]> Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Cyril Vallez <[email protected]> Co-authored-by: Pavel Iakubovskii <[email protected]> Co-authored-by: Ákos Hadnagy <[email protected]> Co-authored-by: Benjamin Bossan <[email protected]> Co-authored-by: Ita Zaporozhets <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Lysandre <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: Anton Vlasjuk <[email protected]> Co-authored-by: Yoni Gozlan <[email protected]> Co-authored-by: Joao Gante <[email protected]> Co-authored-by: StevenBucaille <[email protected]> Co-authored-by: BakerBunker <[email protected]> Co-authored-by: lvyuanjun.lyj <[email protected]> Co-authored-by: Ayush <[email protected]> Co-authored-by: Ryan Mullins <[email protected]> Co-authored-by: Yannick Schnider <[email protected]> Co-authored-by: Ralph Gleaton <[email protected]> Co-authored-by: Rémi Ouazan <[email protected]> Co-authored-by: Saidur Rahman Pulok <[email protected]> Co-authored-by: Nick Doiron <[email protected]> Co-authored-by: Wang, Yi <[email protected]> Co-authored-by: Duygu Altinok <[email protected]>…

* Fix EXAONE-4.0 dummy id * Fix exaone4 dummy (huggingface#1) * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]> --------- Co-authored-by: Yih-Dar <[email protected]> Co-authored-by: ydshieh <[email protected]>

Create DataParallel model if several GPUs

5f43248

VictorSanh merged commit a6efe12 into master Nov 3, 2018

thomwolf deleted the multi-gpu-support branch November 4, 2018 00:35

davidkim205 mentioned this pull request Jan 4, 2019

BertTokenizer에서 do_lower_case에 관계없이 #162

Closed

qwang70 pushed a commit to DRL36/pytorch-pretrained-BERT that referenced this pull request Mar 2, 2019

Merge pull request huggingface#1 from huggingface/multi-gpu-support

f6ed6ac

Create DataParallel model if several GPUs

thomwolf pushed a commit that referenced this pull request Apr 23, 2019

Merge pull request #1 from huggingface/master

af8a038

Pulling commits from main repo

maeotaku mentioned this pull request May 23, 2019

bert->onnx ->caffe2 weird error #633

Closed

thomwolf pushed a commit that referenced this pull request Jun 22, 2019

Merge pull request #1 from papower1/papower1-patch-1

ada0d8f

Correct a broken link and its context.

thomwolf pushed a commit that referenced this pull request Jul 25, 2019

Merge pull request #1 from sukuya/sukuya-patch-1

e1e2ab3

Update torchscript.rst

thomwolf pushed a commit that referenced this pull request Sep 10, 2019

Merge pull request #1 from SKRohit/SKRohit-patch-1

4b082bd

changes in return statement of evaluate function

thomwolf pushed a commit that referenced this pull request Sep 11, 2019

Merge pull request #1 from huggingface/master

7424b28

merege from original repo

thomwolf pushed a commit that referenced this pull request Sep 18, 2019

Merge pull request #1 from erenup/run_multiple_choice

2a2832c

roberta, xlnet for multiple choice

HongyanJiao mentioned this pull request Sep 19, 2019

traced_model #1291

Closed

fabrahman mentioned this pull request Oct 9, 2019

How is it possible to furthur tune gpt-2(or gpt) in a seq2seq manner? #1464

Closed

thomwolf pushed a commit that referenced this pull request Oct 22, 2019

Merge pull request #1 from huggingface/master

bf2c36a

update

TheEdoardo93 mentioned this pull request Nov 4, 2019

GPT2 text generation repeat #1725

Closed

devroy73 mentioned this pull request Nov 10, 2019

Multi GPU dataparallel crash #1779

Closed

4 tasks

snaik2016 mentioned this pull request Nov 21, 2019

Distilling GPT2 with gives OOM #1897

Closed

This was referenced Nov 30, 2019

Can we use tf.keras.mixed_precision.experimental.set_policy ? #2004

Closed

tf.keras.mixed_precision.experimental.Policy #2005

Closed

gradient-school mentioned this pull request Dec 2, 2019

[ALBERT] : ValueError: Layer #1 (named "predictions") expects 11 weight(s), but the saved weights have 10 element(s). #2024

Closed

4 tasks

Aidanlochbihler mentioned this pull request Feb 21, 2020

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding) #2952

Closed

volker42maru mentioned this pull request Mar 18, 2020

TF BERT not FP16 compatible? #3320

Closed

2 tasks

stevezheng23 added a commit to stevezheng23/transformers that referenced this pull request Mar 24, 2020

Merge pull request huggingface#1 from huggingface/master

deb2e71

Merge changes from huggingface/transformers to stevezheng23/transformers

go-dustin mentioned this pull request May 4, 2020

Pytorch T5 does not run on GPU #2472

Closed

yyHaker mentioned this pull request May 12, 2020

why squad.py did not reproduce squad1.1 report result? #4301

Closed

3 tasks

dbaxter240 mentioned this pull request May 18, 2020

Issues with the EncoderDecoderModel for sequence to sequence tasks #4443

Closed

patrickvonplaten added a commit to patrickvonplaten/transformers that referenced this pull request Jun 7, 2020

Merge pull request huggingface#1 from patrickvonplaten/fix_tests_in_o…

f51162a

…utput_attentions fix pytorch tests

leonardtang mentioned this pull request Mar 1, 2025

GRPO Reward Weight Scheduler #36490

Closed

Rocketknight1 mentioned this pull request Mar 6, 2025

add-long-vita #36553

Draft

5 tasks

leonardtang mentioned this pull request Mar 9, 2025

GRPO Reward Weight Scheduler huggingface/trl#3036

Open

SilverSoldier mentioned this pull request Apr 2, 2025

torch.compile graph break when tuning llama with FA2 #37199

Closed

4 tasks

kreil mentioned this pull request May 14, 2025

eval_loss not found when training a peft model using trainer.py / losses not retrieved from base model where appropriate #38130

Closed

4 tasks

blzheng pushed a commit to blzheng/transformers that referenced this pull request Jul 15, 2025

Merge pull request huggingface#1 from blzheng/beilei/fix_llama_acc_issue

38317a1

Fix llama acc issue on gsm8k: update block_mask

Guo-Chenxu referenced this pull request in Guo-Chenxu/transformers Jul 28, 2025

Merge pull request #1 from tc-mb/main

996b0b8

sync main

yonigozlan pushed a commit to yonigozlan/transformers that referenced this pull request Jul 29, 2025

Merge pull request huggingface#1 from SangbumChoi/sam2_hf

f07991f

SAM2 extra init changes

SunnyThakur25 mentioned this pull request Jul 30, 2025

BioGPT Implementation Bug Report #39776

Closed

4 tasks

yuchenxie4645 added a commit to yuchenxie4645/transformers that referenced this pull request Sep 8, 2025

Fix huggingface#1

6bbabf4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create DataParallel model if several GPUs #1

Create DataParallel model if several GPUs #1

Uh oh!

VictorSanh commented Nov 3, 2018

Uh oh!

Uh oh!

Create DataParallel model if several GPUs #1

Create DataParallel model if several GPUs #1

Uh oh!

Conversation

VictorSanh commented Nov 3, 2018

Uh oh!

Uh oh!