Inconsistent behavior with optional file types when coercing from string #696

stxue1 · 2024-06-27T01:54:12Z

When a file is declared as optional, the value should be null if the file does not exist. In an array of optional files, this should mean those nonexistent file entries should become null.

When coercing from string to file, this check for file existence seems to be inconsistent depending on how the WDL is written:

version 1.1
workflow testWorkflow {
  input {
  }
  call testTask
  output {
    Array[File?] array_in_output = testTask.array_in_output
    Int len_in_output = testTask.len_in_output
    Array[File?] array_in_body_out = testTask.array_in_body_out
    Int len_in_body_out = testTask.len_in_body_out
    Array[File?] array_in_input_out = testTask.array_in_input_out
    Int len_in_input_out = testTask.len_in_input_out
  }
}

task testTask {
  input {
    Array[File?] array_in_input = ["example1.txt", "example2.txt"]
    Int len_in_input = length(select_all(array_in_input))
  }
  command <<<>>>
  Array[File?] array_in_body = ["example1.txt", "example2.txt"]
  Int len_in_body = length(select_all(array_in_body))
  output {
    Array[File?] array_in_output = ["example1.txt", "example2.txt"]
    Int len_in_output = length(select_all(array_in_output))
    Array[File?] array_in_body_out = array_in_body
    Int len_in_body_out = len_in_body
    Array[File?] array_in_input_out = array_in_input
    Int len_in_input_out = len_in_input
  }
}

With MINIWDL__FILE_IO__ALLOW_ANY_INPUT=True miniwdl run test.wdl, this returns:

{
  "dir": "/home/heaucques/Documents/wdl-conformance-tests/20240626_184902_testWorkflow",
  "outputs": {
    "testWorkflow.array_in_body_out": [
      null,
      null
    ],
    "testWorkflow.array_in_input_out": [
      null,
      null
    ],
    "testWorkflow.array_in_output": [
      null,
      null
    ],
    "testWorkflow.len_in_body_out": 2,
    "testWorkflow.len_in_input_out": 2,
    "testWorkflow.len_in_output": 0
  }
}

All of len_in_body_out, len_in_input_out, and len_in_output should be 0, but when processed inside the task body/input, select_all runs on the string representation instead of the file representation; only processing it in the output does it check that those file paths exist.

String to file coercion also behaves differently depending if it is within a task or a workflow; the above WDL is mostly in a task, as the same code does not work in a workflow:

version 1.1
workflow testWorkflow {
  input {
  }
  output {
    Array[File?] array_in_output = ["example1.txt", "example2.txt"]
    Int len_in_output = length(select_all(array_in_output))
  }
}

2024-06-26 18:53:32.953 wdl.w:testWorkflow workflow start :: name: "testWorkflow", source: "test.wdl", line: 2, column: 1, dir: "/home/heaucques/Documents/wdl-conformance-tests/20240626_185332_testWorkflow"
2024-06-26 18:53:32.954 wdl.w:testWorkflow miniwdl :: version: "v1.12.0", uname: "Linux pop-os 6.8.0-76060800daily20240311-generic #202403110203~1715181801~22.04~aba43ee SMP PREEMPT_DYNAMIC Wed M x86_64"
2024-06-26 18:53:32.954 wdl.w:testWorkflow task thread pool initialized :: task_concurrency: 8
2024-06-26 18:53:32.968 wdl.w:testWorkflow visit :: node: "output-array_in_output", values: {"array_in_output": ["example1.txt", "example2.txt"]}
2024-06-26 18:53:32.968 wdl.w:testWorkflow visit :: node: "output-len_in_output", values: {"len_in_output": 2}
2024-06-26 18:53:32.995 wdl.w:testWorkflow workflow testWorkflow (test.wdl Ln 2 Col 1) failed :: dir: "/home/heaucques/Documents/wdl-conformance-tests/20240626_185332_testWorkflow", error: "InputError", message: "workflow output uses nonexistent file/directory: example1.txt", node: "outputs"
2024-06-26 18:53:32.995 wdl.w:testWorkflow aborting workflow
2024-06-26 18:53:32.995 miniwdl-run workflow output uses nonexistent file/directory: example1.txt :: error: "InputError", node: "outputs", dir: "/home/heaucques/Documents/wdl-conformance-tests/20240626_185332_testWorkflow"

The text was updated successfully, but these errors were encountered:

mlin · 2024-06-27T22:34:53Z

@stxue1 Thanks, I think there are two slightly separate topics here:

The language in the spec about a File? with a nonexistent path successfully evaluating to None/null is scoped to the task output section and (IMO) not applicable outside that context. It's a convenience so that we don't have to write an explicit if/then/else statement in the task output declaration. I don't think it's a good idea to apply elsewhere.
The incorrect result of length(select_all(...)) seems like a bug I need to look into further to Miniwdl fails on list with optional outputs even though select_all is used #614 improve File? task output processing #615

stxue1 · 2024-06-27T22:51:12Z

Taking a look at the SPEC again, it does look like coercion to None for nonexistent optional files is scoped to task outputs. So my previous first complaint isn't really an issue. However, would it be a good idea to apply this behavior for workflow outputs as well?

I think the length(select_all(...)) issue might boil down to needing more clarification from the WDL spec. For one, the working directory for string to file coercion sounds like it should be different depending whether in a task input, body, or output:
https://github.com/openwdl/wdl/blob/caff59db192636d9f93f3f5659eb5939f51ff877/SPEC.md?plain=1#L3944

If converting File? types to None happens at output, then running select_all on an array with nonexistent coerced files will always return something different if not ran at the output step. For now, this behavior seems to be what the spec wants, but sounds incorrect as those File? types should be None regardless. I'll probably open an issue on openwdl for this.

stxue1 mentioned this issue Jun 27, 2024

toil-wdl-runner does not handle string to file coercion properly for optional file types DataBiosphere/toil#4992

Closed

mlin added the interop Bears on spec compatibility label Jun 27, 2024

stxue1 mentioned this issue Jul 2, 2024

Undefined behavior when turning coerced optional File? to null + Clarification about where String to File coercion takes place openwdl/wdl#673

Open

github-project-automation bot added this to miniwdl Aug 28, 2024

github-project-automation bot moved this to Backlog in miniwdl Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent behavior with optional file types when coercing from string #696

Inconsistent behavior with optional file types when coercing from string #696

stxue1 commented Jun 27, 2024 •

edited

Loading

mlin commented Jun 27, 2024

stxue1 commented Jun 27, 2024

Inconsistent behavior with optional file types when coercing from string #696

Inconsistent behavior with optional file types when coercing from string #696

Comments

stxue1 commented Jun 27, 2024 • edited Loading

mlin commented Jun 27, 2024

stxue1 commented Jun 27, 2024

stxue1 commented Jun 27, 2024 •

edited

Loading