Refactors Scores and add querying/sorting options #153

florian-huber · 2020-10-19T08:52:29Z

Here I changed quite some things to make the API nicer to work with:

address Consider making Scores.scores a structured array #143 --> make Scores.scores a structured array (only when score contains more than one component, e.g. score + matches)
add sorting option to scores_by_reference and scores_by_query method (implemented via BaseSimilarity)
fix bug in Spectrum(): getter for metadata entries did not use copy()
fix bug/issue in Spectrum(): __eq__ method compared metadata dictionaries simply with ==. However, that fails when numpy array are added to the metadata (which we do in the add_fingerprints filter).
added tests for those cases

In #152 we had also discussed adding Scores.top_scores_by_query(query, n=np.Inf) and Scores.top_scores_by_reference(reference, n=np.Inf), but this is not done here yet. For time reasons I would postpone it for another round of additions.

…chms into add_querying_options

florian-huber · 2020-12-04T07:56:32Z

@sverhoeven Sorry for the back and forth (draft-not-draft...). Took me some time to find the bugs/issues that were causing weird behavior (they were in Spectrum), but that should be fixed now. Just added a missing unit test and now it should be ready for review.

tests/test_spectrum.py

sverhoeven

Overall looks good, docstrings render OK, additional tests result in nice coverage.
I like the approach of having a BaseSimilarity.sort() method. (I might have suggested it before, but good to see that it was also implementable in a unobtrusive way)

Calculation scores between 2 spectra is failing to return a structured array

I tried

In [1]:         import numpy as np
   ...:         from matchms import calculate_scores
   ...:         from matchms import Spectrum
   ...:         from matchms.similarity import CosineGreedy
   ...: 
   ...:         spectrum_1 = Spectrum(mz=np.array([100, 150, 200.]),
   ...:                               intensities=np.array([0.7, 0.2, 0.1]),
   ...:                               metadata={'id': 'spectrum1'})
   ...:         spectrum_2 = Spectrum(mz=np.array([100, 140, 190.]),
   ...:                               intensities=np.array([0.4, 0.2, 0.1]),
   ...:                               metadata={'id': 'spectrum2'})

In [2]: similarity_measure = CosineGreedy()

In [3]: scores = calculate_scores([spectrum_1],[spectrum_2], similarity_measure)

In [6]: scores._scores
Out[6]: array([[(0.831479419283098, 1)]], dtype=object)

In [7]: list(scores)
Out[7]: 
[(<matchms.Spectrum.Spectrum at 0x7fb29ac050d0>,
  <matchms.Spectrum.Spectrum at 0x7fb29ac05070>,
  0.831479419283098,
  1)]

Instead of 0.831479419283098, 1 I expected to get back a dictionary or structured array.
Can you add a test for this use case?

matchms/Scores.py

tests/test_scores.py

matchms/Scores.py

sverhoeven · 2020-12-08T13:29:40Z

matchms/similarity/BaseSimilarity.py

+        """
+        if scores.dtype.names is None:
+            return scores.argsort()[::-1]
+        return scores["score"].argsort()[::-1]


Don't think the base class should look at the incoming dtype. It should sort according to the score_datatype. So it should only consist out of line 81. Line 82 is specific to the Cosine classes it should not be part of the base.

Classes which override score_datatype should have their own sort() if needed. The `sort() could be implemented in an intermediate abstract class or in the class itself.

I reduced it to line 81 as suggested. In fact, that even works for all scores we implemented so far (without need to define own sort() method).

readthedocs/index.rst

Co-authored-by: Stefan Verhoeven <[email protected]>

sonarqubecloud · 2020-12-10T13:53:43Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

100.0% Coverage
0.0% Duplication

florian-huber · 2021-01-04T15:12:59Z

Thanks a lot for the reviewing Stefan, that was very helpful! I believe I could address your comments and will merge this PR now.

florian-huber added 11 commits October 19, 2020 10:50

add score types

ddb14f6

try dict for dtype

1f93dbc

update scores test

b24a23c

linting

e005929

update tests

a89bfa5

adapt similarity function to new dtype

881c226

update tests to structured arrays

fa32e9d

fix import error

e289b19

fix import error

c0a7c19

update code example and simply functions

c5d6278

linting

2c3c113

florian-huber mentioned this pull request Oct 20, 2020

Consider making Scores.scores a structured array #143

Closed

florian-huber added 9 commits December 3, 2020 12:25

add sort to scores_by_reference

39558ff

fix typo

ac2b484

add sort to scores_by_query

8ffbb4c

Merge branch 'master' into add_querying_options

d9d2bac

linting

4d46c76

update docstring examples

081998a

doing what isort says (but still not becoming friends)

34823e3

fix code examples

0bc61bf

Update CHANGELOG.md

801bf6e

florian-huber marked this pull request as ready for review December 3, 2020 13:10

florian-huber requested a review from sverhoeven December 3, 2020 13:16

florian-huber added 3 commits December 3, 2020 14:57

add dtype to modified cosine

87da125

Merge branch 'add_querying_options' of https://github.com/matchms/mat…

29aa7d1

…chms into add_querying_options

add dtype to modified cosine

6de46c5

florian-huber marked this pull request as draft December 3, 2020 14:05

florian-huber added 3 commits December 3, 2020 22:10

implement data-types

5e1894b

fix two bugs in Spectrum()

83a2819

add new sorting testcase

a23617c

florian-huber added 2 commits December 3, 2020 22:16

add missing import

4c80320

fix code example

a320ba2

florian-huber marked this pull request as ready for review December 4, 2020 07:42

add missing unit test

5b702a0

florian-huber commented Dec 4, 2020

View reviewed changes

tests/test_spectrum.py Outdated Show resolved Hide resolved

florian-huber added 5 commits December 7, 2020 11:38

Update tests/test_spectrum.py

c197809

Merge branch 'master' into add_querying_options

2d5b051

adapt integration test

52a67a5

refine integration test

32f2fa1

avoid rounding errors in test

0d3f2ff

sverhoeven requested changes Dec 8, 2020

View reviewed changes

florian-huber and others added 3 commits December 9, 2020 09:08

Apply suggestions from code review

5361544

Co-authored-by: Stefan Verhoeven <[email protected]>

linting

36a0fdc

make .pair method also return structured array

1fbe50b

florian-huber mentioned this pull request Dec 9, 2020

Re-design of Scores and similarity classes #135

Closed

florian-huber added 6 commits December 10, 2020 11:44

update tests and fix cosineHungarian

a09a7de

update code example

2010366

linting

d4d67c1

update docstring code examples

1d9f6a2

linting

6cc3678

remove dtype checking from BaseClass

45bb9b2

florian-huber mentioned this pull request Jan 4, 2021

Predefined top=n argument on Scores.calculate() could save memory #62

Closed

florian-huber merged commit eac7fdb into master Jan 4, 2021

florian-huber deleted the add_querying_options branch January 4, 2021 15:13

florian-huber mentioned this pull request Jan 4, 2021

Currently it is not clear how users should query the scores #152

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactors Scores and add querying/sorting options #153

Refactors Scores and add querying/sorting options #153

florian-huber commented Oct 19, 2020 •

edited

Loading

florian-huber commented Dec 4, 2020 •

edited

Loading

sverhoeven left a comment

sverhoeven Dec 8, 2020

florian-huber Dec 10, 2020

sonarqubecloud bot commented Dec 10, 2020

florian-huber commented Jan 4, 2021

Refactors Scores and add querying/sorting options #153

Refactors Scores and add querying/sorting options #153

Conversation

florian-huber commented Oct 19, 2020 • edited Loading

florian-huber commented Dec 4, 2020 • edited Loading

sverhoeven left a comment

Choose a reason for hiding this comment

sverhoeven Dec 8, 2020

Choose a reason for hiding this comment

florian-huber Dec 10, 2020

Choose a reason for hiding this comment

sonarqubecloud bot commented Dec 10, 2020

florian-huber commented Jan 4, 2021

florian-huber commented Oct 19, 2020 •

edited

Loading

florian-huber commented Dec 4, 2020 •

edited

Loading