-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: calculate Cooks distances with few samples #44
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very high level review, but is there any way to add a unit test to check the behaviour that lead to this bug? This way, we make sure the codebase does not forget about it. Also, it's a good way to ensure this PR really fixes the related issue.
I guess we could add a test where the pipeline is run on a dataset with few samples, as the piece of code responsible for the bug only is run only when no cohort has 3 or more samples. Not sure what to test for though, would error-free execution be enough? |
Yes I think that simply adding this corner case and ensuring it runs without error is a good starting point! |
449442a
to
580f216
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @BorisMuzellec
This PR aims to fix the bug described in Issue #43.
The issue seemed to be that the summation was done on the wrong axis in the
trimmed_variance
function, called throughrobust_method_of_moments_disp
in thecalculate_cooks
method of theDeseqDataSet
class.