Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] one vs. all others #168

Open
acoteataltius opened this issue Aug 23, 2023 · 8 comments
Open

[Feature request] one vs. all others #168

acoteataltius opened this issue Aug 23, 2023 · 8 comments
Labels
enhancement New feature or request

Comments

@acoteataltius
Copy link

I'd like to be able to input a contrast design (or otherwise choose design factors), to do a one vs all comparison within in a particular "condition" that has more than two levels. If my column "condition" has levels A, B, C, and D, do a comparison of A vs B, C, D.

Something like these options in R deseq2:
design <- ~0 + condition
contrast = c(1, -1/3, -1/3, -1/3)
contrast=list(c("conditionA"),
c("conditionB","conditionC","conditionD"))

Would it be possible to do something where if you leave the second option blank in contrast, like:
contrast = ['condition', 'A', '']
it compares A with all other samples?

@acoteataltius acoteataltius changed the title one vs. all others [Feature request] one vs. all others Aug 23, 2023
@BorisMuzellec BorisMuzellec added the enhancement New feature or request label Aug 31, 2023
@BorisMuzellec
Copy link
Collaborator

Hi @acoteataltius, that would be a convenient feature to have indeed.

It's not available in pydeseq2 yet, but I'm adding it to our feature wishlist. I'll give it a go when I have time, but I'm also happy to help anyone opening a PR. Not sure what would be the best way to implement it from a user perspective (maybe a one_vs_all boolean argument?).

In the meantime it seems that it would be possible to obtain the same results by manually setting the contrast_vector attribute after initializing the DeseqStats object, but I'm not 100% sure about this either.

@GalaMichal
Copy link

Hi @BorisMuzellec I'd like to ask it is even possible to compare all vs all? Basically, treating each level of the condition factor as a separate group and not setting any of them as a reference (e.g. healthy).

Something like in R Deseq2:
design <- ~0 + condition

@BorisMuzellec
Copy link
Collaborator

BorisMuzellec commented Dec 15, 2023

@GalaMichal there is unfortunately no direct way to do this as of yet. This relates to #213.

However I think it is possible to obtain the same design matrix using pydeseq2.utils.build_design_matrix with no intercept but an expanded design, and use it in your DeseqDataSet like this:

dds = ds.DeseqDataSet(counts=counts, metadata=metadata, design_factors="condition")

# This is where you replace the design matrix
dds.obsm["design_matrix"] =  build_design_matrix(
            metadata=dds.obs,
            design_factors=dds.design_factors,
            expanded=True,
            intercept=False,
        )

# And then you should be able to carry on as usual

dds.deseq2() # etc.

Let me know if this works!

@GalaMichal
Copy link

@BorisMuzellec thank you for quick response. Unfortunately, it doesn't work.
dds.deseq2() is calculated but stat_res = DeseqStats(dds) shows: KeyError: 'Condition_1_vs_Condition_1

The same situation occurs when I try, for example, stat_res = DeseqStats(dds, contrast =("Conditions", "Condition_1", "Condition_2"))

'

@Rafael-Silva-Oliveira
Copy link

Rafael-Silva-Oliveira commented Oct 16, 2024

@BorisMuzellec Any news on this? Without a one vs rest approach we're a bit forced to use scanpy's rank_genes_group, but would be very nice to have it on PyDESeq2

@Rafael-Silva-Oliveira
Copy link

Rafael-Silva-Oliveira commented Oct 16, 2024

I was trying to adapt so that a temporary metadata is created in the sense that we would create a new column with "group X" vs "rest" as such:

from pydeseq2.ds import DeseqDataSet, DeseqStats
import pandas as pd
import numpy as np
from itertools import combinations

def run_pydeseq2_extended(counts, metadata, design_factors):
    # Create DDS object
    dds = DeseqDataSet(
        counts=counts,
        metadata=metadata,
        design_factors=design_factors,
        refit_cooks=True
    )
    
    # Run DESeq2 analysis
    dds.deseq2()
    
    results = {}
    
    # Get unique groups
    unique_groups = sorted(metadata[design_factors].unique())
    
    # Generate all pairwise comparisons
    comparisons = list(combinations(unique_groups, 2))
    
    # One-vs-One comparisons
    for group1, group2 in comparisons:
        stat_res = DeseqStats(dds, contrast=[design_factors, group1, group2], alpha=0.05)
        stat_res.summary()
        results[f"{group1}_vs_{group2}"] = stat_res.results_df
    
    # One-vs-Rest comparisons
    for group in unique_groups:
        # Create a temporary metadata for one-vs-rest
        temp_metadata = metadata.copy()
        temp_metadata[design_factors] = np.where(temp_metadata[design_factors] == group, group, 'rest')
        
        # Create a new DDS object for one-vs-rest
        dds_rest = DeseqDataSet(
            counts=counts,
            metadata=temp_metadata,
            design_factors=design_factors,
            refit_cooks=True
        )
        dds_rest.deseq2()
        
        stat_res_rest = DeseqStats(dds_rest, contrast=[design_factors, group, 'rest'], alpha=0.05)
        stat_res_rest.summary()
        results[f"{group}_vs_rest"] = stat_res_rest.results_df
    
    return results

# Usage
counts = pd.DataFrame(adata_subset.X, index=adata_subset.obs_names, columns=adata_subset.var_names)
metadata = pd.DataFrame({'nmf-group': adata_subset.obs['nmf-group']})
design_factors = "nmf-group"

results = run_pydeseq2_extended(counts, metadata, design_factors)

Then I created a volcano plot for all comparisions including "Others" as such (in this case for NMF factors, but you could try with any other grouping)

image

@BorisMuzellec Would this be an accurate approach to do this?

@BorisMuzellec
Copy link
Collaborator

BorisMuzellec commented Nov 12, 2024

Hi @Rafael-Silva-Oliveira, sorry for the late reply.

I think this could be solved by #328 once it's merged, as it will then be possible to input contrast vectors directly. Then, it should be possible to apply @acoteataltius's method (see also here).

@Rafael-Silva-Oliveira
Copy link

Hi @Rafael-Silva-Oliveira, sorry for the late reply.

I think this could be solved by #328 once it's merged, as it will then be possible to input contrast vectors directly. Then, it should be possible to apply @acoteataltius's method (see also here).

Perfect, thank you! Do you think my approach would work too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants