Skip to content

[llama] AutoTokenizer does not add eos_token at the end #23833

@csyourui

Description

@csyourui

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

  • transformers version: 4.29.2
  • Platform: Linux-3.10.0-1160.42.2.el7.x86_64-x86_64-with-glibc2.35
  • Python version: 3.9.16
  • Huggingface_hub version: 0.14.1
  • Safetensors version: not installed
  • PyTorch version (GPU?): 2.0.1+cu118 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

Who can help?

@ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

code:

from transformers import AutoTokenizer, LlamaTokenizer

auto_tokenizer = AutoTokenizer.from_pretrained("huggyllama/llama-7b", add_eos_token=True, use_fast=True)
llama_tokenizer = LlamaTokenizer.from_pretrained("huggyllama/llama-7b", add_eos_token=True, use_fast=True)

print(auto_tokenizer.decode(auto_tokenizer.encode("auto_tokenizer", add_special_tokens = True)))
print(llama_tokenizer.decode(llama_tokenizer.encode("llama_tokenizer", add_special_tokens = True)))

results:

<s> auto_tokenizer
<s> llama_tokenizer</s>

Expected behavior

add eos token like:

<s> auto_tokenizer</s>
<s> llama_tokenizer</s>

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions