Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bm.plot_results_table() fails when all batch correction metrics are set to False #157

Closed
soerenab opened this issue Mar 20, 2024 · 5 comments · Fixed by #181
Closed

bm.plot_results_table() fails when all batch correction metrics are set to False #157

soerenab opened this issue Mar 20, 2024 · 5 comments · Fixed by #181
Assignees
Labels
bug Something isn't working

Comments

@soerenab
Copy link

Report

I only want to run the bio conservation metrics, so I have initialized the Benchmarker as follows:

batch_correction = BatchCorrection(
    silhouette_batch=False,
    ilisi_knn=False,
    kbet_per_label=False,
    graph_connectivity=False,
    pcr_comparison=False
)

bio_conservation = BioConservation(
    isolated_labels=True,
    nmi_ari_cluster_labels_leiden=False,
    nmi_ari_cluster_labels_kmeans=True,
    silhouette_label=True,
    clisi_knn=True
)

bm = Benchmarker(
    adata,
    batch_key="dummy_batch",
    label_key="cell_type",
    embedding_obsm_keys=embedding_obsm_keys,
    batch_correction_metrics=batch_correction,
    bio_conservation_metrics=bio_conservation,
    n_jobs=6,
)

Then, bm.benchmark() runs fine, however, when I want to plot the results with bm.plot_results_table() I get the below error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/miniconda3/envs/contrastive-transformer/lib/python3.10/site-packages/pandas/core/indexes/base.py:3802, in Index.get_loc(self, key, method, tolerance)
   3801 try:
-> 3802     return self._engine.get_loc(casted_key)
   3803 except KeyError as err:

File ~/miniconda3/envs/contrastive-transformer/lib/python3.10/site-packages/pandas/_libs/index.pyx:138, in pandas._libs.index.IndexEngine.get_loc()

File ~/miniconda3/envs/contrastive-transformer/lib/python3.10/site-packages/pandas/_libs/index.pyx:165, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:5745, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:5753, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Batch correction'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[11], line 1
----> 1 bm.plot_results_table()

File ~/miniconda3/envs/contrastive-transformer/lib/python3.10/site-packages/scib_metrics/benchmark/_core.py:287, in Benchmarker.plot_results_table(self, min_max_scale, show, save_dir)
    285 num_embeds = len(self._embedding_obsm_keys)
    286 cmap_fn = lambda col_data: normed_cmap(col_data, cmap=matplotlib.cm.PRGn, num_stds=2.5)
--> 287 df = self.get_results(min_max_scale=min_max_scale)
    288 # Do not want to plot what kind of metric it is
    289 plot_df = df.drop(_METRIC_TYPE, axis=0)

File ~/miniconda3/envs/contrastive-transformer/lib/python3.10/site-packages/scib_metrics/benchmark/_core.py:266, in Benchmarker.get_results(self, min_max_scale, clean_names)
    264 per_class_score = df.groupby(_METRIC_TYPE).mean().transpose()
    265 # This is the default scIB weighting from the manuscript
--> 266 per_class_score["Total"] = 0.4 * per_class_score["Batch correction"] + 0.6 * per_class_score["Bio conservation"]
    267 df = pd.concat([df.transpose(), per_class_score], axis=1)
    268 df.loc[_METRIC_TYPE, per_class_score.columns] = _AGGREGATE_SCORE

File ~/miniconda3/envs/contrastive-transformer/lib/python3.10/site-packages/pandas/core/frame.py:3807, in DataFrame.__getitem__(self, key)
   3805 if self.columns.nlevels > 1:
   3806     return self._getitem_multilevel(key)
-> 3807 indexer = self.columns.get_loc(key)
   3808 if is_integer(indexer):
   3809     indexer = [indexer]

File ~/miniconda3/envs/contrastive-transformer/lib/python3.10/site-packages/pandas/core/indexes/base.py:3804, in Index.get_loc(self, key, method, tolerance)
   3802     return self._engine.get_loc(casted_key)
   3803 except KeyError as err:
-> 3804     raise KeyError(key) from err
   3805 except TypeError:
   3806     # If we have a listlike key, _check_indexing_error will raise
   3807     #  InvalidIndexError. Otherwise we fall through and re-raise
   3808     #  the TypeError.
   3809     self._check_indexing_error(key)

KeyError: 'Batch correction'

My guess is that the following line fails as I did not run any batch integration metrics
per_class_score["Total"] = 0.4 * per_class_score["Batch correction"] + 0.6 * per_class_score["Bio conservation"]
so the per_class_score is probably not defined.

It would be great to be able to plot the results even if I did not run any batch integration metric. A potential solution in this case could be to simply not compute (and plot) the total score as it probably does not make sense anyways, or to set per_class_score["Batch correction"] to per_class_score["Bio conservation"] so that the final score is simply the bio conservation score.)

Version information


anndata 0.10.5
numpy 1.26.4
pandas 1.5.3
scanpy 1.9.8
scib 1.1.4
scib_metrics 0.5.1
session_info 1.0.0

PIL 10.2.0
absl NA
anyio NA
arrow 1.3.0
asttokens NA
attr 23.2.0
attrs 23.2.0
babel 2.14.0
brotli 1.1.0
certifi 2024.02.02
cffi 1.16.0
charset_normalizer 3.3.2
chex 0.1.85
cloudpickle 3.0.0
colorama 0.4.6
comm 0.2.1
cycler 0.12.1
cython_runtime NA
cytoolz 0.12.3
dask 2024.2.0
dateutil 2.8.2
debugpy 1.8.1
decorator 5.1.1
defusedxml 0.7.1
deprecated 1.2.14
exceptiongroup 1.2.0
executing 2.0.1
fastjsonschema NA
fqdn NA
google NA
h5py 3.10.0
idna 3.6
igraph 0.11.4
importlib_metadata NA
ipykernel 6.29.2
ipywidgets 8.1.2
isoduration NA
jax 0.4.25
jaxlib 0.4.25
jedi 0.19.1
jinja2 3.1.3
joblib 1.3.2
json5 NA
jsonpointer 2.4
jsonschema 4.21.1
jsonschema_specifications NA
jupyter_events 0.9.0
jupyter_server 2.12.5
jupyterlab_server 2.25.3
kiwisolver 1.4.5
leidenalg 0.10.2
llvmlite 0.42.0
lz4 4.3.3
markupsafe 2.1.5
matplotlib 3.8.3
ml_dtypes 0.3.2
mpl_toolkits NA
natsort 8.4.0
nbformat 5.9.2
numba 0.59.0
opt_einsum v3.3.0
overrides NA
packaging 23.2
parso 0.8.3
patsy 0.5.6
pickleshare 0.7.5
platformdirs 4.2.0
plottable 0.1.5
prometheus_client NA
prompt_toolkit 3.0.42
psutil 5.9.8
pure_eval 0.2.2
pyarrow 15.0.0
pycparser 2.21
pydev_ipython NA
pydevconsole NA
pydevd 2.9.5
pydevd_file_utils NA
pydevd_plugins NA
pydevd_tracing NA
pydot 2.0.0
pygments 2.17.2
pynndescent 0.5.11
pyparsing 3.1.1
pythonjsonlogger NA
pytz 2024.1
referencing NA
requests 2.31.0
rfc3339_validator 0.1.4
rfc3986_validator 0.1.1
rich NA
rpds NA
scipy 1.12.0
seaborn 0.13.2
send2trash NA
six 1.16.0
sklearn 1.4.1.post1
sniffio 1.3.0
socks 1.7.1
stack_data 0.6.2
statsmodels 0.14.1
tblib 3.0.0
texttable 1.7.0
threadpoolctl 3.3.0
tlz 0.12.3
toolz 0.12.1
torch 2.2.1
torchgen NA
tornado 6.4
tqdm 4.66.2
traitlets 5.14.1
typing_extensions NA
umap 0.5.5
uri_template NA
urllib3 2.2.1
wcwidth 0.2.13
webcolors 1.13
websocket 1.7.0
wrapt 1.16.0
yaml 6.0.1
zipp NA
zmq 25.1.2
zoneinfo NA

IPython 8.22.0
jupyter_client 8.6.0
jupyter_core 5.7.1
jupyterlab 4.1.2
notebook 7.1.0

Python 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:36:39) [GCC 12.3.0]
Linux-4.18.0-513.11.1.el8_9.x86_64-x86_64-with-glibc2.28

Session information updated at 2024-03-20 12:13

@soerenab soerenab added the bug Something isn't working label Mar 20, 2024
@martinkim0
Copy link
Member

Thanks for bringing this up! I'll take a look at this

@martinkim0 martinkim0 self-assigned this Mar 20, 2024
@LArnoldt
Copy link

Hey @martinkim0 - Thank you for this great package!

The issue described above also appplies to the function get_results in _core.py, since function can't calculate:

per_class_score["Total"] = 0.4 * per_class_score["Batch correction"] + 0.6 * per_class_score["Bio conservation"]

since either "Batch correction" or "Bio conservation" is not available in df per_class_score, when disabling all metrics.

@adamgayoso
Copy link
Member

Hi @LArnoldt -- we are happy to accept a pull request to fix this.

Perhaps to the batch correction and bio conservation dataclasses we can add a helper fn that counts how many metrics are active. This can be then be used to control the plotting code and the total score.

@SidSouthekal-Lilly
Copy link

same issue with get_results() and plot_results_table() if either Bio conservation or Batch corrrection is set to False. Any fix yet ?

@LArnoldt
Copy link

LArnoldt commented Sep 6, 2024

Hi @adamgayoso @SidSouthekal-Lilly - please see PR ##179.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
5 participants