Skip to content

Commit 0baf100

Browse files
jmfernandezmr-c
authored andcommitted
Fixed race condition which happens when a job runs "too fast",
When a podman process finishes even before reaching the monitoring method, a deadlock happens, as no one is updating `process.returncode` and spawned process is in zombie state (so, no signal is sent). This fix adds a `process.poll()` call, so it gives the chance to fill in `process.returncode`.
1 parent 5a645df commit 0baf100

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

cwltool/job.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -857,6 +857,10 @@ def docker_monitor(
857857
cid: Optional[str] = None
858858
while cid is None:
859859
time.sleep(1)
860+
# This is needed to avoid a race condition where the job
861+
# was so fast that it already finished when it arrives here
862+
if process.returncode is None:
863+
process.poll()
860864
if process.returncode is not None:
861865
if cleanup_cidfile:
862866
try:

0 commit comments

Comments
 (0)