-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Undefined behavior when turning coerced optional File?
to null + Clarification about where String to File coercion takes place
#673
Comments
File?
to null + Clarification about where String to File coercion takes placeFile?
to null + Clarification about where String to File coercion takes place
To address your first question: paths are relative to the execution environment. So if a container is used, then that means relative to the working directory inside the container. But that means it's generally a bad idea to use relative paths as default values for inputs because the execution engine is free to make the working directory be whatever it wants. For example, it could mount a volume on the host machine to |
You are correct that missing optional workflow outputs should also be |
For the last part - I'm not sure I fully understand. Could you create a separate issue with a minimal reproducible test case (including any necessary input files)? Thanks |
Sure, I think my main concern in the original included WDL workflow is how the For example (Much of the WDL evaluation is in tasks instead of the workflow due to a limitation of miniwdl): version 1.1
workflow testWorkflow_output {
input {
}
call testTask
output {
Array[File?] array_of_files = testTask.array_of_files
Int len = testTask.len
}
}
task testTask {
input {
}
command <<<>>>
output {
Array[File?] array_of_files = ["example1.txt", "example2.txt"]
Int len = length(select_all(array_of_files))
}
} In the task, the files For this workflow, Miniwdl agrees:
However, given a workflow like this: version 1.1
workflow testWorkflow_body {
input {
}
call testTask
output {
Array[File?] array_of_files = testTask.array_of_files
Int len = testTask.len
}
}
task testTask {
input {
}
command <<<>>>
Array[File?] array_of_files_0 = ["example1.txt", "example2.txt"]
Int len_0 = length(select_all(array_in_body))
output {
Array[File?] array_of_files = array_of_files_0
Int len = len_0
}
} I've effectively moved these lines from the output of the task to the body: Array[File?] array_of_files = ["example1.txt", "example2.txt"]
Int len = length(select_all(array_of_files)) Thus, the second workflow will compute Assuming the files do not exist on my machine, my "intuition" would think that the output of both workflows are the same: For the first workflow For the second workflow However, the spec specifically says that the coercion of an optional file will become null only in task outputs: Line 3880 in caff59d
As a result, for the second workflow Miniwdl seems to follow this concept, as the output for
Even though Rather than applying the coercion of the nonexistent files |
I originally encountered these issues at chanzuckerberg/miniwdl#696.
One thing the WDL spec is vague about is how a task should coerce string to file. The spec says that all non-output declarations must run prior to the command section. My implicit understanding is that the output declaration will be running under a different directory than the rest of the task. It sounds like the output declarations are running in the current directory under the host machine, while the output section is running in the current directory inside the container.
For example:
Assuming all files exist, it's implicitly assumed that
f_input
andf_body
will point to some file on the host machine, butf_output
will point to the file inside the container. Maybe this should be clarified in the SPEC, as it is not immediately obvious.Another issue that arose when testing around with miniwdl is that there can be inconsistent behavior with coerced optional files.
Given the WDL workflow:
The spec says that optional file types at task outputs will be coerced to null.
For one, is there a reason why this scope is limited to just task outputs and not workflow outputs?
Additionally, because the spec says this null coercion is applied at the output step, given that the files
example1.txt
andexample2.txt
don't exist, the assumed correct output for the WDL workflow above is:Because the null coercion happens at the task output, the
select_all
function calls all will return different values depending on what part of the section it is called in; the body will return["example1.txt", "example2.txt"]
, giving a length of 2. However, for the task output declaration, the functionselect_all
will return[null, null]
, giving a length of 0. Since this can be counterintuitive as one may expect that a nonexistent file will always not be counted in aselect_all
call, is this the expected behavior, or what should the expected behavior be?The text was updated successfully, but these errors were encountered: