feat: Add option to make annotaTR less strict on Beagle AP field checks #233

gymreklab · 2024-08-30T23:18:50Z

The main change is the addition of the option --warn-on-AP-error which results in skipping loci where checks on AP fields fail. In these cases, rather than the program quitting, we output nan values for dosages. In particular the checks this is relevant to are:

Checking if the AP1/2 fields exist
Checking if they sum to more than 1
Checking for negative values
Checking if normalized values end up being >=2.1 or <=-0.1

Most of these should still never happen. We have encountered cases where values sum to more than 1, likely due to rounding errors in cases with huge numbers of alleles.

This is a somewhat dangerous flag and its use should not be encouraged. The main motivation is in cases where we run annotaTR on huge VCF files which takes many hours only to encounter a bad AP field at the very end and crash, or when the vast majority of AP fields are fine but a few problematic loci cause the whole run to fail.

Other specific changes:

Added option strict to GetDosages(). This defaults to true, in which case we throw ValueError for the cases above. If this is false, we output a warning and return all dosage values as np.nan.
Regardless of whether the strict option is set, added info to the error/warning messages about which locus was problematic to help with tracking down those cases.

Checklist

[ x] I've checked to ensure there aren't already other open pull requests for the same update/change
[ x] I've prefixed the title of my PR according to the conventional commits specification. If your PR fixes a bug, please prefix the PR with fix: . Otherwise, if it introduces a new feature, please prefix it with feat: . If it introduces a breaking change, please add an exclamation before the colon, like feat!: . If the scope of the PR changes because of a revision to it, please update the PR title, since the title will be used in our CHANGELOG.
[ x] At the top of the PR, I've listed any open issues that this PR will resolve. For example, "resolves #0" if this PR resolves issue #0

[ x] I've explained my changes in a manner that will make it possible for both users and maintainers of TRTools to understand them

[x ] I've added tests for any new functionality. Or, if this PR fixes a bug, I've added test(s) that replicate it
[ x] All directories with large test files are listed in the "exclude" section of our pyproject.toml so that they do not appear in our PyPI distribution. All new files are also smaller than 0.5 MB.
[ x] I've updated the relevant REAMDEs with any new usage information and checked that the newly built documentation is formatted properly
[ x] All functions, modules, classes etc. still conform to numpy docstring standards
[x ] (if applicable) I've updated the pyproject.toml file with any changes I've made to TRTools's dependencies, and I've run poetry lock --no-update to ensure the lock file stays up to date and that our dependencies are locked to their minimum versions
[ x] In the body of this PR, I've included a short address to the reviewer highlighting one or two items that might deserve their focus

addoption to only warn on AP error

8641f14

gymreklab changed the base branch from master to add-longtr-support August 30, 2024 23:21

gymreklab closed this Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add option to make annotaTR less strict on Beagle AP field checks #233

feat: Add option to make annotaTR less strict on Beagle AP field checks #233

gymreklab commented Aug 30, 2024

feat: Add option to make annotaTR less strict on Beagle AP field checks #233

feat: Add option to make annotaTR less strict on Beagle AP field checks #233

Conversation

gymreklab commented Aug 30, 2024

Checklist