Logistic regression support for the Discriminator Classifier #560

Lilly-May · 2024-03-18T15:36:44Z

PR Checklist

Referenced issue is linked (closes Add logistic regression for perturbation space creation #538)
If you've fixed a bug or added code that should be tested, add tests!
Documentation in docs is updated

Description of changes

As an alternative to a multi-layer perceptron (MLP), the discriminator classifier now also supports logistic regression for embedding creation
The model (MLP or regression) is defined when creating the DiscriminatorClassifierSpace object. By default, it's set to MLP, ensuring backward compatibility with the previous usage
I decreased the number of epochs for the MLP test from 5 to 2. As a result, computing time decreases from 1min 15sec to 43sec on my local machine. 2 epochs are still enough for the MLP to learn how to separate the classes.
Added and restructured the tests for DiscriminatorClassifierSpace: The adata is now provided via a fixture and is subsequently used by both the MLP and the regression classifier testing methods

Technical details

I tested the regression classifier implementation using the Norman dataset using the following code:

sc.pp.pca(adata, n_comps=30)
ps = pt.tl.DiscriminatorClassifierSpace("regression")
classifier_ps = ps.load(adata, embedding_key="X_pca", target_col="perturbation_name")
classifier_ps.train()
pert_embeddings = classifier_ps.get_embeddings()

sc.pp.neighbors(pert_embeddings, use_rep='X')
sc.tl.umap(pert_embeddings)
sc.pl.umap(pert_embeddings, color=['gene_programme'])

Which results in the following UMAP:

for more information, see https://pre-commit.ci

Lilly-May · 2024-03-18T15:39:45Z

I would like to add documentation examples for the regression classifier. But I also think we should keep the current examples using the MLP models. @Zethson is it okay if I simply add a second example in the same docstring?

codecov · 2024-03-18T16:08:20Z

Codecov Report

Attention: Patch coverage is 83.33333% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 63.52%. Comparing base (916c837) to head (7fd7bbe).
Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #560      +/-   ##
==========================================
+ Coverage   63.40%   63.52%   +0.12%     
==========================================
  Files          43       43              
  Lines        5052     5091      +39     
==========================================
+ Hits         3203     3234      +31     
- Misses       1849     1857       +8

Files	Coverage Δ
pertpy/tools/__init__.py	`100.00% <100.00%> (ø)`
pertpy/tools/_distances/_distances.py	`88.60% <ø> (ø)`
pertpy/tools/_perturbation_space/_simple.py	`75.89% <100.00%> (ø)`
.../_perturbation_space/_discriminator_classifiers.py	`90.64% <82.60%> (ø)`

... and 1 file with indirect coverage changes

Zethson

Awesome, very good job!

What was your impression concerning usage and parameter documentation? Was it too annoying to always be like: "This only applies to the MLP" or the other way around? I'm trying to assess whether we should split them into two functions or roll with what you nicely implemented.

pertpy/tools/_perturbation_space/_discriminator_classifier.py

tests/tools/_perturbation_space/test_discriminator_classifier.py

Zethson · 2024-03-18T17:04:18Z

I would like to add documentation examples for the regression classifier. But I also think we should keep the current examples using the MLP models. @Zethson is it okay if I simply add a second example in the same docstring?

Yes, please! Also thought about that while reviewing.

Lilly-May · 2024-03-19T09:33:06Z

What was your impression concerning usage and parameter documentation? Was it too annoying to always be like: "This only applies to the MLP" or the other way around? I'm trying to assess whether we should split them into two functions or roll with what you nicely implemented.

If we want to stick with the DiscriminatorClassifierSpace class, I would keep the implementation as it is in this PR, with regression as a parameter choice. The downside of this is that the implementations are quite different (in each method, I have an if-else statement which separates between MLP or regression), but that isn't really of interest to the user. So, the fact that several parameters are not applicable for the respective model might be the bigger problem.

I think the alternative would be to have two different classes, something like MLPClassifierSpace and RegressionClassifierSpace. For the latter, we would probably only have one method, compute, analogous to the other Perturbation Spaces instead of load, train and get_embeddings. Personally, I think this approach might be a bit more intuitive and easier to understand for users, but it's not backward compatible, as the DiscriminatorClassifierSpace would be removed.

for more information, see https://pre-commit.ci

Zethson · 2024-03-19T09:50:22Z

I think the alternative would be to have two different classes, something like MLPClassifierSpace and RegressionClassifierSpace. For the latter, we would probably only have one method, compute, analogous to the other Perturbation Spaces instead of load, train and get_embeddings. Personally, I think this approach might be a bit more intuitive and easier to understand for users, but it's not backward compatible, as the DiscriminatorClassifierSpace would be removed.

Concerning backwards compatibility: We could alias the classes. A simple DiscriminatorClassifierSpace = MLPClassifierSpace in an __init__.py would probably do the trick.

I think that if we really wanted to we could probably provide a somewhat sane load and get_embeddings also for the RegressionClassifierSpace, but they'd be super simple, right? I'll leave this up to you to judge and we'll roll with whatever you think is best.

But yeah, splitting this into two is I think the better approach

for more information, see https://pre-commit.ci

…nto feature/regression_classifier

Lilly-May added 6 commits March 18, 2024 12:06

Restructured tests for MLP classifier

c192756

Added test for regression classifier

a254e7c

Fixed dimension error in regression classifier

0f4a708

Fixed regression classifier test

f3a078d

Keep obs annotations in regression embedding

fab8b15

Improved method descriptions

59befc1

github-actions bot added the enhancement New feature or request label Mar 18, 2024

[pre-commit.ci] auto fixes from pre-commit.com hooks

fa9815b

for more information, see https://pre-commit.ci

Lilly-May requested a review from Zethson March 18, 2024 16:14

Zethson changed the title ~~Feature/regression classifier~~ Logistic regression support for the Discriminator Classifier Mar 18, 2024

Zethson approved these changes Mar 18, 2024

View reviewed changes

Lilly-May and others added 2 commits March 19, 2024 10:42

PR Reviews

6471c7a

[pre-commit.ci] auto fixes from pre-commit.com hooks

2e933de

for more information, see https://pre-commit.ci

Lilly-May and others added 4 commits March 19, 2024 17:03

LR and MLP in separate classes

7768b22

[pre-commit.ci] auto fixes from pre-commit.com hooks

29b65e7

for more information, see https://pre-commit.ci

Docs and tests adjustments

bab9d87

Merge remote-tracking branch 'origin/feature/regression_classifier' i…

8da2dcb

…nto feature/regression_classifier

github-actions bot added the chore label Mar 19, 2024

Lilly-May added 2 commits March 19, 2024 17:21

Fixed distances docs description

43d8342

Renamed files (plural)

7fd7bbe

Lilly-May removed the chore label Mar 19, 2024

Lilly-May merged commit 80ef0f0 into main Mar 19, 2024
7 of 8 checks passed

Zethson deleted the feature/regression_classifier branch May 20, 2024 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logistic regression support for the Discriminator Classifier #560

Logistic regression support for the Discriminator Classifier #560

Lilly-May commented Mar 18, 2024

Lilly-May commented Mar 18, 2024

codecov bot commented Mar 18, 2024 •

edited

Loading

Zethson left a comment

Zethson commented Mar 18, 2024

Lilly-May commented Mar 19, 2024

Zethson commented Mar 19, 2024

Logistic regression support for the Discriminator Classifier #560

Logistic regression support for the Discriminator Classifier #560

Conversation

Lilly-May commented Mar 18, 2024

Lilly-May commented Mar 18, 2024

codecov bot commented Mar 18, 2024 • edited Loading

Codecov Report

Zethson left a comment

Choose a reason for hiding this comment

Zethson commented Mar 18, 2024

Lilly-May commented Mar 19, 2024

Zethson commented Mar 19, 2024

codecov bot commented Mar 18, 2024 •

edited

Loading