Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PED file validation #587
PED file validation #587
Changes from 4 commits
60c09a1
021f2ee
8e0db42
0c923e7
7bef5b4
6dab052
04c8384
8decb1f
ea8aeb5
8fb9509
65ad29c
28a68f4
db7cc5a
5b3280f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but you could define the id_type strings as constants at the top of the script (e.g. as you've done with
FIELD_NUMBER_SEX
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be more strict and require that it be 0, 1, or 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way is fine with me. I did it this way because it will only fail in CleanVcfPart1 if it's not an integer, and I wasn't sure if some groups might want to encode different categories of "other" in the sex column. But it does seem simpler to just enforce 0,1,2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would good to require this passes before attempting to run cnmops, otherwise you would get two errors with a bad ped file (1 from validation and 1 from cnmops, which could be confusing). If you let the validated ped file be the output, you can just wire that up to the ped file subset task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware you could do this, but I think it would be better to have either a dummy empty output or to just output the validated ped file. That way we can force Cromwell to wait until the validation is complete before moving on to tasks that consume the ped file.