-
Notifications
You must be signed in to change notification settings - Fork 30.7k
Description
Hi
I am runnig seq2seq_trainer on TPUs I am always getting this connection issue could you please have a look
sicne this is on TPUs this is hard for me to debug
thanks
Best
Rabeeh
2389961.mean (11/20/2020 05:24:09 PM) (Detached)
local_files_only=local_files_only,
File "/anaconda3/envs/torch-xla-1.7/lib/python3.6/site-packages/transformers/file_utils.py", line 955, in cached_path
local_files_only=local_files_only,
File "/anaconda3/envs/torch-xla-1.7/lib/python3.6/site-packages/transformers/file_utils.py", line 1125, in get_from_cache
"Connection error, and we cannot find the requested files in the cached path."
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.
Traceback (most recent call last):
File "/home/rabeeh//internship/seq2seq/xla_spawn.py", line 71, in
main()
XLA label: %copy.32724.remat = f32[80,12,128,128]{3,2,1,0:T(8,128)} copy(f32[80,12,128,128]{2,3,1,0:T(8,128)} %bitcast.576)
Allocation type: HLO temp
==========================
- Size: 60.00M
Shape: f32[80,12,128,128]{3,2,1,0:T(8,128)}
Unpadded size: 60.00M
XLA label: %copy.32711.remat = f32[80,12,128,128]{3,2,1,0:T(8,128)} copy(f32[80,12,128,128]{2,3,1,0:T(8,128)
0%| | 2/18060 [08:12<1234:22:09, 246.08s/it]Traceback (most recent call last):
File "/home/rabeeh//internship/seq2seq/xla_spawn.py", line 71, in
main()
File "/home/rabeeh//internship/seq2seq/xla_spawn.py", line 67, in main
xmp.spawn(mod._mp_fn, args=(), nprocs=args.num_cores)
File "/anaconda3/envs/torch-xla-1.7/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 395, in spawn
start_method=start_method)
File "/anaconda3/envs/torch-xla-1.7/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
while not context.join():
File "/anaconda3/envs/torch-xla-1.7/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 112, in join
(error_index, exitcode)