Skip to content

connection issue #8690

@rabeehk

Description

@rabeehk

Hi
I am runnig seq2seq_trainer on TPUs I am always getting this connection issue could you please have a look
sicne this is on TPUs this is hard for me to debug
thanks
Best
Rabeeh

    2389961.mean    (11/20/2020 05:24:09 PM)        (Detached)
local_files_only=local_files_only,

File "/anaconda3/envs/torch-xla-1.7/lib/python3.6/site-packages/transformers/file_utils.py", line 955, in cached_path
local_files_only=local_files_only,
File "/anaconda3/envs/torch-xla-1.7/lib/python3.6/site-packages/transformers/file_utils.py", line 1125, in get_from_cache
"Connection error, and we cannot find the requested files in the cached path."
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.
Traceback (most recent call last):
File "/home/rabeeh//internship/seq2seq/xla_spawn.py", line 71, in
main()
XLA label: %copy.32724.remat = f32[80,12,128,128]{3,2,1,0:T(8,128)} copy(f32[80,12,128,128]{2,3,1,0:T(8,128)} %bitcast.576)
Allocation type: HLO temp
==========================

  1. Size: 60.00M
    Shape: f32[80,12,128,128]{3,2,1,0:T(8,128)}
    Unpadded size: 60.00M
    XLA label: %copy.32711.remat = f32[80,12,128,128]{3,2,1,0:T(8,128)} copy(f32[80,12,128,128]{2,3,1,0:T(8,128)
    0%| | 2/18060 [08:12<1234:22:09, 246.08s/it]Traceback (most recent call last):
    File "/home/rabeeh//internship/seq2seq/xla_spawn.py", line 71, in
    main()
    File "/home/rabeeh//internship/seq2seq/xla_spawn.py", line 67, in main
    xmp.spawn(mod._mp_fn, args=(), nprocs=args.num_cores)
    File "/anaconda3/envs/torch-xla-1.7/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 395, in spawn
    start_method=start_method)
    File "/anaconda3/envs/torch-xla-1.7/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
    while not context.join():
    File "/anaconda3/envs/torch-xla-1.7/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 112, in join
    (error_index, exitcode)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions