-
Notifications
You must be signed in to change notification settings - Fork 31k
Closed
Closed
Copy link
Description
System Info
Hi everyone. I'm trying to run example from here https://github.com/huggingface/transformers/tree/main/examples/pytorch
Transformers library was installed from the source as it was requested during the first run
accelerate==0.24.1
torch==1.13.0a0+936e930
The running command:
accelerate launch run_summarization_no_trainer.py \
--model_name_or_path t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--source_prefix "summarize: " \
--output_dir tst-summarization
Full error:
Traceback (most recent call last):
File "run_summarization_no_trainer.py", line 782, in <module>
Traceback (most recent call last):
File "run_summarization_no_trainer.py", line 782, in <module>
main()
File "run_summarization_no_trainer.py", line 705, in main
generated_tokens = accelerator.unwrap_model(model).generate(
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py", line 1565, in generate
generation_config.validate()
File "/usr/local/lib/python3.8/dist-packages/transformers/generation/configuration_utils.py", line 413, in validate
logging.warning("`num_beams` is set to None - defaulting to 1.", UserWarning)
AttributeError: module 'transformers.utils.logging' has no attribute 'warning'
main()
File "run_summarization_no_trainer.py", line 705, in main
generated_tokens = accelerator.unwrap_model(model).generate(
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py", line 1565, in generate
generation_config.validate()
File "/usr/local/lib/python3.8/dist-packages/transformers/generation/configuration_utils.py", line 413, in validate
logging.warning("`num_beams` is set to None - defaulting to 1.", UserWarning)
AttributeError: module 'transformers.utils.logging' has no attribute 'warning'
33%|██████████████ | 35/105 [00:05<00:11, 6.08it/s]
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 4120986) of binary: /usr/bin/python
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 985, in launch_command
multi_gpu_launcher(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 654, in multi_gpu_launcher
distrib_run.run(args)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
run_summarization_no_trainer.py FAILED
------------------------------------------------------------
Failures:
[1]:
time : 2023-11-23_18:24:03
host : 99dgx-02.mtsai.superpod.local
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 4120987)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-11-23_18:24:03
host : 99dgx-02.mtsai.superpod.local
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 4120986)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
accelerate launch run_summarization_no_trainer.py
--model_name_or_path t5-small
--dataset_name cnn_dailymail
--dataset_config "3.0.0"
--source_prefix "summarize: "
--output_dir tst-summarization
Expected behavior
Model is training, no errors occur
Metadata
Metadata
Assignees
Labels
No labels