Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactors Scores and add querying/sorting options #153

Merged
merged 43 commits into from
Jan 4, 2021

Conversation

florian-huber
Copy link
Collaborator

@florian-huber florian-huber commented Oct 19, 2020

Here I changed quite some things to make the API nicer to work with:

  • address Consider making Scores.scores a structured array #143 --> make Scores.scores a structured array (only when score contains more than one component, e.g. score + matches)
  • add sorting option to scores_by_reference and scores_by_query method (implemented via BaseSimilarity)
  • fix bug in Spectrum(): getter for metadata entries did not use copy()
  • fix bug/issue in Spectrum(): __eq__ method compared metadata dictionaries simply with ==. However, that fails when numpy array are added to the metadata (which we do in the add_fingerprints filter).
  • added tests for those cases

In #152 we had also discussed adding Scores.top_scores_by_query(query, n=np.Inf) and Scores.top_scores_by_reference(reference, n=np.Inf), but this is not done here yet. For time reasons I would postpone it for another round of additions.

@florian-huber florian-huber marked this pull request as ready for review December 3, 2020 13:10
@florian-huber florian-huber marked this pull request as draft December 3, 2020 14:05
@florian-huber florian-huber marked this pull request as ready for review December 4, 2020 07:42
@florian-huber
Copy link
Collaborator Author

florian-huber commented Dec 4, 2020

@sverhoeven Sorry for the back and forth (draft-not-draft...). Took me some time to find the bugs/issues that were causing weird behavior (they were in Spectrum), but that should be fixed now. Just added a missing unit test and now it should be ready for review.

tests/test_spectrum.py Outdated Show resolved Hide resolved
Copy link
Member

@sverhoeven sverhoeven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, docstrings render OK, additional tests result in nice coverage.
I like the approach of having a BaseSimilarity.sort() method. (I might have suggested it before, but good to see that it was also implementable in a unobtrusive way)

Calculation scores between 2 spectra is failing to return a structured array

I tried

In [1]:         import numpy as np
   ...:         from matchms import calculate_scores
   ...:         from matchms import Spectrum
   ...:         from matchms.similarity import CosineGreedy
   ...: 
   ...:         spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]),
   ...:                               intensities=np.array([0.7, 0.2, 0.1]),
   ...:                               metadata={'id': 'spectrum1'})
   ...:         spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]),
   ...:                               intensities=np.array([0.4, 0.2, 0.1]),
   ...:                               metadata={'id': 'spectrum2'})

In [2]: similarity_measure = CosineGreedy()

In [3]: scores = calculate_scores([spectrum_1],[spectrum_2], similarity_measure)

In [6]: scores._scores
Out[6]: array([[(0.831479419283098, 1)]], dtype=object)

In [7]: list(scores)
Out[7]: 
[(<matchms.Spectrum.Spectrum at 0x7fb29ac050d0>,
  <matchms.Spectrum.Spectrum at 0x7fb29ac05070>,
  0.831479419283098,
  1)]

Instead of 0.831479419283098, 1 I expected to get back a dictionary or structured array.
Can you add a test for this use case?

matchms/Scores.py Show resolved Hide resolved
matchms/Scores.py Outdated Show resolved Hide resolved
matchms/Scores.py Outdated Show resolved Hide resolved
tests/test_scores.py Show resolved Hide resolved
matchms/Scores.py Show resolved Hide resolved
matchms/Scores.py Outdated Show resolved Hide resolved
matchms/Scores.py Outdated Show resolved Hide resolved
matchms/Scores.py Outdated Show resolved Hide resolved
"""
if scores.dtype.names is None:
return scores.argsort()[::-1]
return scores["score"].argsort()[::-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think the base class should look at the incoming dtype. It should sort according to the score_datatype. So it should only consist out of line 81. Line 82 is specific to the Cosine classes it should not be part of the base.

Classes which override score_datatype should have their own sort() if needed. The `sort() could be implemented in an intermediate abstract class or in the class itself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reduced it to line 81 as suggested. In fact, that even works for all scores we implemented so far (without need to define own sort() method).

readthedocs/index.rst Show resolved Hide resolved
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

100.0% 100.0% Coverage
0.0% 0.0% Duplication

@florian-huber
Copy link
Collaborator Author

Thanks a lot for the reviewing Stefan, that was very helpful! I believe I could address your comments and will merge this PR now.

@florian-huber florian-huber merged commit eac7fdb into master Jan 4, 2021
@florian-huber florian-huber deleted the add_querying_options branch January 4, 2021 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants