-
Notifications
You must be signed in to change notification settings - Fork 30.8k
Closed
Description
Hi,
I launched two processes per node to run distributed run_classifier.py. However, I am occasionally get below error:
11/20/2018 09:31:48 - INFO - pytorch_pretrained_bert.file_utils - copying /tmp/tmpa25_y4es to cache at /root/.pytorch_pretrained_bert/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba
93%|█████████▎| 381028352/407873900 [00:11<00:01, 14366075.22B/s]
94%|█████████▍| 383812608/407873900 [00:11<00:01, 16210783.00B/s]
95%|█████████▍| 386455552/407873900 [00:11<00:01, 16205260.89B/s]11/20/2018 09:31:49 - INFO - pytorch_pretrained_bert.file_utils - creating metadata file for /root/.pytorch_pretrained_bert/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba
11/20/2018 09:31:49 - INFO - pytorch_pretrained_bert.file_utils - removing temp file /tmp/tmpa25_y4es
95%|█████████▌| 388946944/407873900 [00:11<00:01, 18097539.03B/s]11/20/2018 09:31:49 - INFO - pytorch_pretrained_bert.modeling - loading archive file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz from cache at /root/.pytorch_pretrained_bert/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba
11/20/2018 09:31:49 - INFO - pytorch_pretrained_bert.modeling - extracting archive file /root/.pytorch_pretrained_bert/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba to temp dir /tmp/tmpvxvnr8_1
97%|█████████▋| 393660416/407873900 [00:11<00:00, 22199883.93B/s]
98%|█████████▊| 399411200/407873900 [00:11<00:00, 27211860.00B/s]
99%|█████████▉| 405128192/407873900 [00:11<00:00, 32287252.94B/s]
100%|██████████| 407873900/407873900 [00:11<00:00, 34098120.40B/s]
11/20/2018 09:31:49 - INFO - pytorch_pretrained_bert.file_utils - copying /tmp/tmp5fcm4v8x to cache at /root/.pytorch_pretrained_bert/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba
Traceback (most recent call last):
File "examples/run_classifier.py", line 629, in <module>
main()
File "examples/run_classifier.py", line 485, in main
model = BertForSequenceClassification.from_pretrained(args.bert_model, len(label_list))
File "/azureml-envs/azureml_49b6ba977c83839baa597001c9b55a6f/lib/python3.6/site-packages/pytorch_pretrained_bert-0.1.2-py3.6.egg/pytorch_pretrained_bert/modeling.py", line 495, in from_pretrained
archive.extractall(tempdir)
File "/azureml-envs/azureml_49b6ba977c83839baa597001c9b55a6f/lib/python3.6/tarfile.py", line 2007, in extractall
numeric_owner=numeric_owner)
File "/azureml-envs/azureml_49b6ba977c83839baa597001c9b55a6f/lib/python3.6/tarfile.py", line 2049, in extract
numeric_owner=numeric_owner)
File "/azureml-envs/azureml_49b6ba977c83839baa597001c9b55a6f/lib/python3.6/tarfile.py", line 2119, in _extract_member
self.makefile(tarinfo, targetpath)
File "/azureml-envs/azureml_49b6ba977c83839baa597001c9b55a6f/lib/python3.6/tarfile.py", line 2168, in makefile
copyfileobj(source, target, tarinfo.size, ReadError, bufsize)
File "/azureml-envs/azureml_49b6ba977c83839baa597001c9b55a6f/lib/python3.6/tarfile.py", line 248, in copyfileobj
buf = src.read(bufsize)
File "/azureml-envs/azureml_49b6ba977c83839baa597001c9b55a6f/lib/python3.6/gzip.py", line 276, in read
return self._buffer.read(size)
File "/azureml-envs/azureml_49b6ba977c83839baa597001c9b55a6f/lib/python3.6/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/azureml-envs/azureml_49b6ba977c83839baa597001c9b55a6f/lib/python3.6/gzip.py", line 482, in read
raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached
It looks like a race-condition that two processes are simultaneously writing model file to /root/.pytorch_pretrained_bert/
.
Please help to advice any workaround. Thanks!
Metadata
Metadata
Assignees
Labels
No labels