-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix : enable vst be used "blindly" and to fit its own dispersion #268
Conversation
Thanks for the PR @laudmt! I think it does the trick for VST, but I have a little concern regarding the fact that providing a Here's an example of problematic / ambiguous behavior that could occur:
In the above example, I think we should find a way to make the choices of curve fit type for |
Indeed thank you, I was missing this use case. |
pydeseq2/dds.py
Outdated
self.current_fit_type = fit_type | ||
print(f"Using {self.current_fit_type} fit type.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.current_fit_type
doesn't seem to be used elsewhere, so the deseq2()
fit_type
argument doesn't have any effect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, forgot to set it to fit_type.
pydeseq2/dds.py
Outdated
self.fit_type = fit_type | ||
print(f"fit type used : {self.fit_type}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we set a new fit_type
in VST
, then I think we should either set it back to the one provided at initialization, or make it clear that it will also change the fit type used in deseq2()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the user should be able, after he launches vst with its own fit_type, to launch deseq2 and expect the fit_type he provided at initialisation?
Is there a risk if deseq2 and vst have their own separate fit_type (lets say self.vst_fit_type and self.dea_fit_type) and since they both share all the other attributes filled during the fit, that vst will compute stuff that deseq2 will have access to it but is not supposed to?
In that case, should they not share any of the attributes (varm, obs, design_factors etc) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late answer @laudmt!
If I understand correctly, the user should be able, after he launches vst with its own fit_type, to launch deseq2 and expect the fit_type he provided at initialisation?
I think that this is indeed what should happen by default, i.e. if no other fit_type
is provided when calling deseq2()
.
Is there a risk if deseq2 and vst have their own separate fit_type (lets say self.vst_fit_type and self.dea_fit_type) and since they both share all the other attributes filled during the fit, that vst will compute stuff that deseq2 will have access to it but is not supposed to?
In that case, should they not share any of the attributes (varm, obs, design_factors etc) ?
Currently, this is what happens. vst()
and deseq2()
populate the same fields (varm["genewise_dispersions"]
and uns["trend_coeffs"]
in particular), but they recompute and overwrite them every time (except in the case when we call vst(use_design=True)
after calling deseq2()
with the same fit type, in which case we can re-use the computation).
In principle, the only issue I see with this is if a user calls e.g. vst(use_design=False)
after deseq2()
. In that case, varm["genewise_dispersions"]
will be overwritten, and if the user wants to perform some analysis of the gene-wise dispersions (e.g. if they call plot_dispersions()
), then they will access the wrong values (not the ones used in the DEA).
Perhaps a cleaner thing to do would be to prefix any parameter stored by vst()
with "vst_"
? (varm["vst_genewise_dispersions"]
, uns["vst_trend_coeffs"]
, etc.). This would create some memory overhead but would prevent any confusion. What do you think?
Hi @laudmt, I've pushed a commit in which I implemented the solution I suggested earlier, i.e. setting two separate Could you have a look when you have time and tell me if that would work for you :) ? |
…within the VST pipeline with a "vst_" prefix
9873f8a
to
30cf585
Compare
for more information, see https://pre-commit.ci
Thank you @BorisMuzellec I can confirm that this fix the use of pydeseq2 with multiple covariate on my side. |
Reference Issue or PRs
Fix #263
What does your PR implement? Be specific.
Proposed solution :
self.fit_type
is defined at class initialisation and will by default set the fit_type for DEA and VST.