Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broken test test_dumpSTR.py::test_BrokenVCF #206

Closed
aryarm opened this issue Jan 17, 2024 · 2 comments · Fixed by #208
Closed

broken test test_dumpSTR.py::test_BrokenVCF #206

aryarm opened this issue Jan 17, 2024 · 2 comments · Fixed by #208

Comments

@aryarm
Copy link
Member

aryarm commented Jan 17, 2024

the problem

With the newest cyvcf2, test_dumpSTR.py::test_BrokenVCF() fails:

Exception: error parsing variant with `htslib::bcf_read` error-code: 0 and ret: -2

bug hypothesis

This test is actually supposed to check whether dumpSTR will properly fail when given a malformed VCF. The VCF used in the test has a variant record with only four fields instead of the required ten.

So I think what's happening is that htslib is catching the error and reporting it to cyvcf2 before TRTools has a chance to catch it and exit gracefully with a non-zero exit code. The exception is happening when we call next(self.vcffile) here:

return HarmonizeRecord(self.vcftype, next(self.vcffile))

potential solutions

Our options:

  1. Catch the exception within tr_harmonizer.py and somehow gracefully handle it. This will probably require changing all of the code that iterates over VCFs using TRRecordHarmonizer to catch and gracefully handle exceptions arising from within it.
  2. Change the test so that it succeeds regardless of whether an exception is raised. In this case, we could catch the exception within tr_harmonizer.py and then reraise the exception with a more useful message, since TRTools users might be confronted by that exception from now on.
  3. Ask htslib or cyvcf2 to handle this upstream. I think this exception is probably a desired behavior for them, though.

My vote is for option 2

@LiterallyUniqueLogin
Copy link
Contributor

Hey Arya,

Can you please print out the full error message DumpSTR produces, not just the last line? (Or clarify that this is in fact a one line error message).

The following ideas are for an optimal package. Feel free to scale back based on your desired commitment.

I think tr_harmonizer should raise an exception with a meaningful error message when it encounters a malformed VCF. So something like, "Parsing encountered a malformed variant line and failed". The reference to "htslib::bcf_read" would be confusing to any casual user of TRTools who doesn't know its internal dependencies, so hiding the original htslib error message in favor of a cleaner one to our customers seems good to me, especially if the htslib error message came with a stack trace that would further distract a user from what's going on. This agrees with option 2.

However, in theory I do recommend also changing the downstream users of tr_harmonizer like option 1. I forget how Nima and I came to this convention, but originally the tools in TRTools were supposed to fail by writing messages to stderr and returning a status code. I believe the idea was that tools shouldn't produce stack traces by letting exceptions bubble up to the top. This was because stack traces are useful for developers debugging errors which shouldn't occur, but are useless for presenting acceptable errors (i.e. errors derived from bad input) to some tool-user who doesn't understand the tool's source code and so can't interpret the stack trace. With that paradigm, any tool using tr_harmonizer should catch any error that tr_harmonzier might raise and write a more user friendly, context-aware error to stderr while returning an error code. Now, if by not catching the tr_harmonizer error its still very clear to the end user what's going on (and the stack trace isn't too distracting), then maybe its not such a big deal.

@aryarm
Copy link
Member Author

aryarm commented Jan 18, 2024

ah, that totally makes sense! We've had to deal with the same concerns regarding the haptools.data library vs the rest of haptools. We decided on a very similar paradigm

I was hoping the message would be detailed and interpretable enough that we wouldn't have to catch it downstream, but I can try it out and see. I'll also try catching the exception downstream in case its easier than I anticipate

also, you can click here if you want to view the full error message

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants