Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logistic regression support for the Discriminator Classifier #560

Merged
merged 15 commits into from
Mar 19, 2024

Conversation

Lilly-May
Copy link
Collaborator

PR Checklist

Description of changes

  • As an alternative to a multi-layer perceptron (MLP), the discriminator classifier now also supports logistic regression for embedding creation
  • The model (MLP or regression) is defined when creating the DiscriminatorClassifierSpace object. By default, it's set to MLP, ensuring backward compatibility with the previous usage
  • I decreased the number of epochs for the MLP test from 5 to 2. As a result, computing time decreases from 1min 15sec to 43sec on my local machine. 2 epochs are still enough for the MLP to learn how to separate the classes.
  • Added and restructured the tests for DiscriminatorClassifierSpace: The adata is now provided via a fixture and is subsequently used by both the MLP and the regression classifier testing methods

Technical details

I tested the regression classifier implementation using the Norman dataset using the following code:

sc.pp.pca(adata, n_comps=30)
ps = pt.tl.DiscriminatorClassifierSpace("regression")
classifier_ps = ps.load(adata, embedding_key="X_pca", target_col="perturbation_name")
classifier_ps.train()
pert_embeddings = classifier_ps.get_embeddings()

sc.pp.neighbors(pert_embeddings, use_rep='X')
sc.tl.umap(pert_embeddings)
sc.pl.umap(pert_embeddings, color=['gene_programme'])

Which results in the following UMAP:
umap

@github-actions github-actions bot added the enhancement New feature or request label Mar 18, 2024
@Lilly-May
Copy link
Collaborator Author

I would like to add documentation examples for the regression classifier. But I also think we should keep the current examples using the MLP models. @Zethson is it okay if I simply add a second example in the same docstring?

Copy link

codecov bot commented Mar 18, 2024

Codecov Report

Attention: Patch coverage is 83.33333% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 63.52%. Comparing base (916c837) to head (7fd7bbe).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #560      +/-   ##
==========================================
+ Coverage   63.40%   63.52%   +0.12%     
==========================================
  Files          43       43              
  Lines        5052     5091      +39     
==========================================
+ Hits         3203     3234      +31     
- Misses       1849     1857       +8     
Files Coverage Δ
pertpy/tools/__init__.py 100.00% <100.00%> (ø)
pertpy/tools/_distances/_distances.py 88.60% <ø> (ø)
pertpy/tools/_perturbation_space/_simple.py 75.89% <100.00%> (ø)
.../_perturbation_space/_discriminator_classifiers.py 90.64% <82.60%> (ø)

... and 1 file with indirect coverage changes

@Lilly-May Lilly-May requested a review from Zethson March 18, 2024 16:14
@Zethson Zethson changed the title Feature/regression classifier Logistic regression support for the Discriminator Classifier Mar 18, 2024
Copy link
Member

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, very good job!

What was your impression concerning usage and parameter documentation? Was it too annoying to always be like: "This only applies to the MLP" or the other way around? I'm trying to assess whether we should split them into two functions or roll with what you nicely implemented.

@Zethson
Copy link
Member

Zethson commented Mar 18, 2024

I would like to add documentation examples for the regression classifier. But I also think we should keep the current examples using the MLP models. @Zethson is it okay if I simply add a second example in the same docstring?

Yes, please! Also thought about that while reviewing.

@Lilly-May
Copy link
Collaborator Author

What was your impression concerning usage and parameter documentation? Was it too annoying to always be like: "This only applies to the MLP" or the other way around? I'm trying to assess whether we should split them into two functions or roll with what you nicely implemented.

If we want to stick with the DiscriminatorClassifierSpace class, I would keep the implementation as it is in this PR, with regression as a parameter choice. The downside of this is that the implementations are quite different (in each method, I have an if-else statement which separates between MLP or regression), but that isn't really of interest to the user. So, the fact that several parameters are not applicable for the respective model might be the bigger problem.

I think the alternative would be to have two different classes, something like MLPClassifierSpace and RegressionClassifierSpace. For the latter, we would probably only have one method, compute, analogous to the other Perturbation Spaces instead of load, train and get_embeddings. Personally, I think this approach might be a bit more intuitive and easier to understand for users, but it's not backward compatible, as the DiscriminatorClassifierSpace would be removed.

@Zethson
Copy link
Member

Zethson commented Mar 19, 2024

I think the alternative would be to have two different classes, something like MLPClassifierSpace and RegressionClassifierSpace. For the latter, we would probably only have one method, compute, analogous to the other Perturbation Spaces instead of load, train and get_embeddings. Personally, I think this approach might be a bit more intuitive and easier to understand for users, but it's not backward compatible, as the DiscriminatorClassifierSpace would be removed.

Concerning backwards compatibility: We could alias the classes. A simple DiscriminatorClassifierSpace = MLPClassifierSpace in an __init__.py would probably do the trick.

I think that if we really wanted to we could probably provide a somewhat sane load and get_embeddings also for the RegressionClassifierSpace, but they'd be super simple, right? I'll leave this up to you to judge and we'll roll with whatever you think is best.

But yeah, splitting this into two is I think the better approach

@github-actions github-actions bot added the chore label Mar 19, 2024
@Lilly-May Lilly-May removed the chore label Mar 19, 2024
@Lilly-May Lilly-May merged commit 80ef0f0 into main Mar 19, 2024
7 of 8 checks passed
@Zethson Zethson deleted the feature/regression_classifier branch May 20, 2024 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add logistic regression for perturbation space creation
2 participants