Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for arbitratry design matrices and contrast vectors #213

Closed
grst opened this issue Nov 28, 2023 · 7 comments · Fixed by #328
Closed

Support for arbitratry design matrices and contrast vectors #213

grst opened this issue Nov 28, 2023 · 7 comments · Fixed by #328
Labels
enhancement New feature or request

Comments

@grst
Copy link

grst commented Nov 28, 2023

Is your feature request related to a problem? Please describe.
Most linear models support passing designs as design matrices and contrasts as contrast vectors. This is the "smallest common denominator" for specifying designs and it's useful

  • for more complex designs and comparisons that aren't covered by a simple [column, baseline, treatment] triplet
  • for writing wrapper functions (e.g. multi-condition-comparisions) that use PyDESeq2 as one of multiple backends and already deal with building model matrices and contrast vectors from more user-friendly input such as formulae.

Describe the solution you'd like

  • DeSeqDataset should take a design matrix
  • DeseqStats should take a contrast vector with one value per fitted coefficient, such as [0, -1, 1].

Additional context
discussed on the scverse hackathon in Cambridge

CC @const-ae @emdann

@BorisMuzellec
Copy link
Collaborator

Hi @grst @const-ae @emdann, is there a consensus regarding what would be most convenient? I'm assuming we want to use formulaic?

I won't have the bandwidth to implement this feature on my own in the next few weeks, but if anyone wants to give it a try, I'm happing to help them.

@grst
Copy link
Author

grst commented Dec 4, 2023

I don't even think you'd need to deal with formulaic/patsy in PyDESeq2, at least initially. Either tool generates a design matrix (which advanced users could also create manually) which should be the input for PyDESeq2.

@const-ae
Copy link

const-ae commented Dec 4, 2023

I agree with Gregor that the easiest change might be to simply allow some way to provide a design matrix and then just skip the step build_design_matrix at https://github.com/owkin/PyDESeq2/blob/main/pydeseq2/dds.py#L249. Of course, longer term I think it would be great to save the user from converting data + formula to a design matrix and do it internally, but in the end it's just syntactic sugar :)

@jeandut
Copy link
Collaborator

jeandut commented Apr 24, 2024

The PR #181 is implementing the ability to give a design matrix directly however for now it needs to follow pydeseq2 naming conventions for further preprocessing namely the _vs_ syntax.

@jeandut
Copy link
Collaborator

jeandut commented Apr 24, 2024

Don't hesitate to play with the branch and give feedbacks on limitations.

@grst
Copy link
Author

grst commented Oct 29, 2024

for now it needs to follow pydeseq2 naming conventions for further preprocessing namely the vs syntax

does that mean if it doesn't follow the naming conventions it doesn't work at all, or would I just have to specify contrasts manually?

@jeandut
Copy link
Collaborator

jeandut commented Oct 29, 2024

for now it needs to follow pydeseq2 naming conventions for further preprocessing namely the vs syntax

does that mean if it doesn't follow the naming conventions it doesn't work at all, or would I just have to specify contrasts manually?

The way that it is done in this PR is that to extract the design_factors from a user-given design_matrix it assumes interactions are given following pydeseq2 naming conventions. This processing is fairly straightforward and can be inspected in pydeseq2/utils.py at the end of the process_design_factors function (if design_matrix is not None).

We realize that the current situation is not optimal and are trying actively to find the best trade-off between the coverage of deseq2/formulaic functionalities we support and merging this PR "quickly" (sorry that it already took so many times) given the very limited bandwidth we currently have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants