Skip to content

Input directory corruption and infinite symlink using successive InitialWorkDirRequirement #1029

@whlavina

Description

@whlavina

There appear to be 2 bugs in cwltool handling of InitialWorkDirRequirement, in the use of symlinks to optimize and avoid making copies of directories:

  • Directories get corrupted with incorrect symlinks that point to themselves.
  • Symlinks that point to themselves cause an inifinite loop in cwltool

The issue is subtle, and seems to manifest when different parts of a single directory structure are referenced.

Expected Behavior

Input directories should be left pristine, and output directories should be emitted correctly, when using InitialWorkDirRequirement. In the example workflow, a subdirectory with one test file should be the final output.

Actual Behavior

A symlink is installed at dir/subdir/file with target being itself (1st bug), causing an infinite loop in cwltool as it tries to dereference the symlink within PathMapper.visit() (2nd bug).

Workflow Code

Here is a code example that demonstrates the issue: test.cwl

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: Workflow

inputs: []

steps:
  # Create a test directory structure; could be done outside CWL and passed in as input.
  # This input directory should be left pristine.
  mkdirs:
    run:
      class: CommandLineTool
      baseCommand: [bash, '-c', 'mkdir dir dir/subdir && touch dir/subdir/file', '-']
      inputs: []
      outputs:
        mkdirs_out:
          type: Directory
          outputBinding:
            glob: dir
    in: []
    out: [mkdirs_out]

  # Given an input directory, emit a subdirectory as output.
  passthrough1:
    run:
      class: CommandLineTool
      requirements:
      - class: InitialWorkDirRequirement
        listing:
        - entry: $(inputs.passthrough1_in)
          writable: false
      baseCommand: ["true"]
      inputs:
        passthrough1_in:
          type: Directory
      outputs:
        passthrough1_subdir:
          type: Directory
          outputBinding:
            glob: $(inputs.passthrough1_in.basename)/subdir
    in:
      passthrough1_in: mkdirs/mkdirs_out
    out: [passthrough1_subdir]

  # Given a (sub-)directory, emit it unchanged.
  passthrough2:
    run:
      class: CommandLineTool
      requirements:
      - class: InitialWorkDirRequirement
        listing:
        - entry: $(inputs.passthrough2_in)
          writable: false
      baseCommand: ["true"]
      inputs:
        passthrough2_in:
          type: Directory
      outputs:
        passthrough2_subdir:
          type: Directory
          outputBinding:
            glob: $(inputs.passthrough2_in.basename)
    in:
      passthrough2_in: passthrough1/passthrough1_subdir
    out: [passthrough2_subdir]

outputs:
  out:
    type: Directory
    outputSource: passthrough2/passthrough2_subdir

As input, use this empty test.yaml:

{}

Full Traceback

Infinite loop in pathmapper.py around line 249, in PathMapper.visit(); see the comment line # Dereference symbolic links.

Your Environment

  • cwltool version: 1.0.20181217162649

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions