Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add not_notable to CurationRelevance vocabulary for semi-automated curation workflow #1236

Merged
merged 1 commit into from
Oct 30, 2024

Conversation

nagutm
Copy link
Collaborator

@nagutm nagutm commented Oct 28, 2024

This pull request adds the not_notable tag to the CurationRelevance vocabulary as a way to mark papers that are relevant for machine learning training but do not meet the threshold for inclusion in the Bioregistry.

While curating papers, there have been a few instances of entries that provide new identifier information but aren't notable enough, or well-maintained enough for inclusion in the bioregistry (#1225). Rather than curating these as subpar prefixes, tagging them as not_notable allows us to retain them as positive training samples without cluttering the bioregistry with less impactful entries.

Copy link

codecov bot commented Oct 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 43.49%. Comparing base (8950e70) to head (9746585).
Report is 123 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1236      +/-   ##
==========================================
+ Coverage   42.51%   43.49%   +0.98%     
==========================================
  Files         117      118       +1     
  Lines        8327     8191     -136     
  Branches     1963     1346     -617     
==========================================
+ Hits         3540     3563      +23     
+ Misses       4582     4464     -118     
+ Partials      205      164      -41     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -37,3 +37,5 @@ class CurationRelevance(str, enum.Enum):
unclear = enum.auto()
#: Completely unrelated information
irrelevant_other = enum.auto()
#: Relevant for training purposes, but not curated in Bioregistry due to poor/unknown quality
not_notable = enum.auto()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only place we need to add this? I seem to recall also having this info duplicated in the documentation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it also needs to be updated in #1195. I will update that PR separately

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's merge this first then and also add it on the other PR

@bgyori bgyori merged commit 21ea943 into biopragmatics:main Oct 30, 2024
15 checks passed
@nagutm nagutm deleted the not_notable branch December 11, 2024 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants