Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behavior with optional file types when coercing from string #696

Open
stxue1 opened this issue Jun 27, 2024 · 2 comments
Open
Labels
interop Bears on spec compatibility

Comments

@stxue1
Copy link

stxue1 commented Jun 27, 2024

When a file is declared as optional, the value should be null if the file does not exist. In an array of optional files, this should mean those nonexistent file entries should become null.

When coercing from string to file, this check for file existence seems to be inconsistent depending on how the WDL is written:

version 1.1
workflow testWorkflow {
  input {
  }
  call testTask
  output {
    Array[File?] array_in_output = testTask.array_in_output
    Int len_in_output = testTask.len_in_output
    Array[File?] array_in_body_out = testTask.array_in_body_out
    Int len_in_body_out = testTask.len_in_body_out
    Array[File?] array_in_input_out = testTask.array_in_input_out
    Int len_in_input_out = testTask.len_in_input_out
  }
}

task testTask {
  input {
    Array[File?] array_in_input = ["example1.txt", "example2.txt"]
    Int len_in_input = length(select_all(array_in_input))
  }
  command <<<>>>
  Array[File?] array_in_body = ["example1.txt", "example2.txt"]
  Int len_in_body = length(select_all(array_in_body))
  output {
    Array[File?] array_in_output = ["example1.txt", "example2.txt"]
    Int len_in_output = length(select_all(array_in_output))
    Array[File?] array_in_body_out = array_in_body
    Int len_in_body_out = len_in_body
    Array[File?] array_in_input_out = array_in_input
    Int len_in_input_out = len_in_input
  }
}

With MINIWDL__FILE_IO__ALLOW_ANY_INPUT=True miniwdl run test.wdl, this returns:

{
  "dir": "/home/heaucques/Documents/wdl-conformance-tests/20240626_184902_testWorkflow",
  "outputs": {
    "testWorkflow.array_in_body_out": [
      null,
      null
    ],
    "testWorkflow.array_in_input_out": [
      null,
      null
    ],
    "testWorkflow.array_in_output": [
      null,
      null
    ],
    "testWorkflow.len_in_body_out": 2,
    "testWorkflow.len_in_input_out": 2,
    "testWorkflow.len_in_output": 0
  }
}

All of len_in_body_out, len_in_input_out, and len_in_output should be 0, but when processed inside the task body/input, select_all runs on the string representation instead of the file representation; only processing it in the output does it check that those file paths exist.

String to file coercion also behaves differently depending if it is within a task or a workflow; the above WDL is mostly in a task, as the same code does not work in a workflow:

version 1.1
workflow testWorkflow {
  input {
  }
  output {
    Array[File?] array_in_output = ["example1.txt", "example2.txt"]
    Int len_in_output = length(select_all(array_in_output))
  }
}
2024-06-26 18:53:32.953 wdl.w:testWorkflow workflow start :: name: "testWorkflow", source: "test.wdl", line: 2, column: 1, dir: "/home/heaucques/Documents/wdl-conformance-tests/20240626_185332_testWorkflow"
2024-06-26 18:53:32.954 wdl.w:testWorkflow miniwdl :: version: "v1.12.0", uname: "Linux pop-os 6.8.0-76060800daily20240311-generic #202403110203~1715181801~22.04~aba43ee SMP PREEMPT_DYNAMIC Wed M x86_64"
2024-06-26 18:53:32.954 wdl.w:testWorkflow task thread pool initialized :: task_concurrency: 8
2024-06-26 18:53:32.968 wdl.w:testWorkflow visit :: node: "output-array_in_output", values: {"array_in_output": ["example1.txt", "example2.txt"]}
2024-06-26 18:53:32.968 wdl.w:testWorkflow visit :: node: "output-len_in_output", values: {"len_in_output": 2}
2024-06-26 18:53:32.995 wdl.w:testWorkflow workflow testWorkflow (test.wdl Ln 2 Col 1) failed :: dir: "/home/heaucques/Documents/wdl-conformance-tests/20240626_185332_testWorkflow", error: "InputError", message: "workflow output uses nonexistent file/directory: example1.txt", node: "outputs"
2024-06-26 18:53:32.995 wdl.w:testWorkflow aborting workflow
2024-06-26 18:53:32.995 miniwdl-run workflow output uses nonexistent file/directory: example1.txt :: error: "InputError", node: "outputs", dir: "/home/heaucques/Documents/wdl-conformance-tests/20240626_185332_testWorkflow"
@mlin
Copy link
Collaborator

mlin commented Jun 27, 2024

@stxue1 Thanks, I think there are two slightly separate topics here:

  1. The language in the spec about a File? with a nonexistent path successfully evaluating to None/null is scoped to the task output section and (IMO) not applicable outside that context. It's a convenience so that we don't have to write an explicit if/then/else statement in the task output declaration. I don't think it's a good idea to apply elsewhere.
  2. The incorrect result of length(select_all(...)) seems like a bug I need to look into further to Miniwdl fails on list with optional outputs even though select_all is used #614 improve File? task output processing #615

@mlin mlin added the interop Bears on spec compatibility label Jun 27, 2024
@stxue1
Copy link
Author

stxue1 commented Jun 27, 2024

Taking a look at the SPEC again, it does look like coercion to None for nonexistent optional files is scoped to task outputs. So my previous first complaint isn't really an issue. However, would it be a good idea to apply this behavior for workflow outputs as well?

I think the length(select_all(...)) issue might boil down to needing more clarification from the WDL spec. For one, the working directory for string to file coercion sounds like it should be different depending whether in a task input, body, or output:
https://github.com/openwdl/wdl/blob/caff59db192636d9f93f3f5659eb5939f51ff877/SPEC.md?plain=1#L3944

If converting File? types to None happens at output, then running select_all on an array with nonexistent coerced files will always return something different if not ran at the output step. For now, this behavior seems to be what the spec wants, but sounds incorrect as those File? types should be None regardless. I'll probably open an issue on openwdl for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interop Bears on spec compatibility
Projects
Status: Backlog
Development

No branches or pull requests

2 participants