Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potentially relevant papers ranked for curation #1165

Open
github-actions bot opened this issue Aug 9, 2024 · 4 comments
Open

Potentially relevant papers ranked for curation #1165

github-actions bot opened this issue Aug 9, 2024 · 4 comments

Comments

@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2024

This issue contains monthly updates to an automatically ranked list of PubMed papers as candidates for curation in the Bioregistry. Papers may be relevant in at least three ways:
(1) as a new prefix for a resource that can be added to the Bioregistry,
(2) as a provider for an existing prefix, or
(3) as a new publication for an existing prefix already in the Bioregistry.

These curations can happen in separate issues and pull requests. The full list of ranked papers can be found here. If you review any of these papers for relevance, you should edit the curated papers file here; these curations are taken into account when retraining the ranking model.

Entries for a batch of papers from 2022:

PubMed ID Title
39104285 FatPlants: a comprehensive information system for lipid-related genes and metabolic pathways in plants.
39074139 FURNA: A database for functional annotations of RNA structures.
39014503 CREdb: A comprehensive database of Cis-Regulatory Elements and their activity in human cells and tissues.
39047988 Knowledge infrastructure for integrated data management and analysis supporting new approach methods in predictive toxicology and risk assessment.
39115390 GENEVIC: GENetic data exploration and visualization via intelli- gent interactive console.
38991851 PEPhub: a database, web interface, and API for editing, sharing, and validating biological sample metadata.
39095357 PatCID: an open-access dataset of chemical structures in patent documents.
38991828 isolateR: an R package for generating microbial libraries from Sanger sequencing data.
39049520 Data set of fraction unbound values in the in vitro incubations for metabolic studies for better prediction of human clearance.
39084442 HSADab: A comprehensive database for human serum albumin.
39104826 Transforming environmental health datasets from the comparative toxicogenomics database into chord diagrams to visualize molecular mechanisms.
39050757 Advancing drug discovery through assay development: a survey of tool compounds within the human solute carrier superfamily.
39064021 Bioinformatics in Neonatal/Pediatric Medicine-A Literature Review.
39028894 FragHub: A Mass Spectral Library Data Integration Workflow.
39044201 The Digital Atlas of Ancient Rare Diseases (DAARD) and its relevance for current research.
39088253 Making Metadata Machine-Readable as the First Step to Providing Findable, Accessible, Interoperable, and Reusable Population Health Data: Framework Development and Implementation Study.
39119155 Data Policy Finder: an easily integratable tool connecting data librarians with researchers to navigate publication requirements.
39005357 Alzheimer's Disease Knowledge Graph Enhances Knowledge Discovery and Disease Prediction.
39044130 Transcription factor binding specificities of the oomycete Phytophthora infestans reflect conserved and divergent evolutionary patterns and predict function.
39010878 MotifbreakR v2: extended capability and database integration.
@bgyori bgyori changed the title Paper Ranking Results Potentially relevant papers ranked for curation Aug 9, 2024
Copy link
Contributor Author

github-actions bot commented Aug 9, 2024

This issue contains monthly updates to an automatically ranked list of PubMed papers as candidates for curation in the Bioregistry. Papers may be relevant in at least three ways:
(1) as a new prefix for a resource that can be added to the Bioregistry,
(2) as a provider for an existing prefix, or
(3) as a new publication for an existing prefix already in the Bioregistry.

These curations can happen in separate issues and pull requests. The full list of ranked papers can be found here. If you review any of these papers for relevance, you should edit the curated papers file here; these curations are taken into account when retraining the ranking model.

New entries for 2024-07-10 to 2024-08-09:

PubMed ID Title
39104285 FatPlants: a comprehensive information system for lipid-related genes and metabolic pathways in plants.
39074139 FURNA: A database for functional annotations of RNA structures.
39047988 Knowledge infrastructure for integrated data management and analysis supporting new approach methods in predictive toxicology and risk assessment.
39014503 CREdb: A comprehensive database of Cis-Regulatory Elements and their activity in human cells and tissues.
39115390 GENEVIC: GENetic data exploration and visualization via intelli- gent interactive console.
38991851 PEPhub: a database, web interface, and API for editing, sharing, and validating biological sample metadata.
39095357 PatCID: an open-access dataset of chemical structures in patent documents.
38991828 isolateR: an R package for generating microbial libraries from Sanger sequencing data.
39049520 Data set of fraction unbound values in the in vitro incubations for metabolic studies for better prediction of human clearance.
39084442 HSADab: A comprehensive database for human serum albumin.
39050757 Advancing drug discovery through assay development: a survey of tool compounds within the human solute carrier superfamily.
39064021 Bioinformatics in Neonatal/Pediatric Medicine-A Literature Review.
39044201 The Digital Atlas of Ancient Rare Diseases (DAARD) and its relevance for current research.
39028894 FragHub: A Mass Spectral Library Data Integration Workflow.
39088253 Making Metadata Machine-Readable as the First Step to Providing Findable, Accessible, Interoperable, and Reusable Population Health Data: Framework Development and Implementation Study.
39005357 Alzheimer's Disease Knowledge Graph Enhances Knowledge Discovery and Disease Prediction.
39024225 Single-fly genome assemblies fill major phylogenomic gaps across the Drosophilidae Tree of Life.
39113691 CHHM: a Manually Curated Catalogue of Human Histone Modifications Revealing Hotspot Regions and Unique Distribution Patterns.
39044130 Transcription factor binding specificities of the oomycete Phytophthora infestans reflect conserved and divergent evolutionary patterns and predict function.
39101486 Transcriptomics and epigenetic data integration learning module on Google Cloud.

Copy link
Contributor Author

github-actions bot commented Sep 1, 2024

This issue contains monthly updates to an automatically ranked list of PubMed papers as candidates for curation in the Bioregistry. Papers may be relevant in at least three ways:
(1) as a new prefix for a resource that can be added to the Bioregistry,
(2) as a provider for an existing prefix, or
(3) as a new publication for an existing prefix already in the Bioregistry.

These curations can happen in separate issues and pull requests. The full list of ranked papers can be found here. If you review any of these papers for relevance, you should edit the curated papers file here; these curations are taken into account when retraining the ranking model.

New entries for 2024-08-02 to 2024-09-01:

PubMed ID Title
39163546 GMMID: genetically modified mice information database.
39134728 Glycoscience data content in the NCBI Glycans and PubChem.
39145441 Clustering protein functional families at large scale with hierarchical approaches.
39212696 Toward integration of glycan chemical databases: an algorithm and software tool for extracting sugars from chemical structures.
39137905 Functional implications of glycans and their curation: insights from the workshop held at the 16th Annual International Biocuration Conference in Padua, Italy.
39104285 FatPlants: a comprehensive information system for lipid-related genes and metabolic pathways in plants.
39126204 The biomedical relationship corpus of the BioRED track at the BioCreative VIII challenge and workshop.
39174566 An ontology-based knowledge graph for representing interactions involving RNA molecules.
39095357 PatCID: an open-access dataset of chemical structures in patent documents.
39115390 GENEVIC: GENetic data exploration and visualization via intelli- gent interactive console.
39201310 Community Resource: Large-Scale Proteogenomics to Refine Wheat Genome Annotations.
39143381 Online Mendelian Inheritance in Animals (OMIA): a genetic resource for vertebrate animals.
39192607 Autoinhibited Protein Database: a curated database of autoinhibitory domains and their autoinhibition mechanisms.
39184336 RIPS (rapid intuitive pathogen surveillance): a tool for surveillance of genome sequence data from foodborne bacterial pathogens.
39104826 Transforming environmental health datasets from the comparative toxicogenomics database into chord diagrams to visualize molecular mechanisms.
39088253 Making Metadata Machine-Readable as the First Step to Providing Findable, Accessible, Interoperable, and Reusable Population Health Data: Framework Development and Implementation Study.
39171834 Generation of a high confidence set of domain-domain interface types to guide protein complex structure predictions by AlphaFold.
39176907 Merging Biomedical Ontologies with BioSTransformers.
39101486 Transcriptomics and epigenetic data integration learning module on Google Cloud.
39213392 CBGDA: a manually curated resource for gene-disease associations based on genome-wide CRISPR.

Copy link
Contributor Author

github-actions bot commented Oct 1, 2024

This issue contains monthly updates to an automatically ranked list of PubMed papers as candidates for curation in the Bioregistry. Papers may be relevant in at least three ways:
(1) as a new prefix for a resource that can be added to the Bioregistry,
(2) as a provider for an existing prefix, or
(3) as a new publication for an existing prefix already in the Bioregistry.

These curations can happen in separate issues and pull requests. The full list of ranked papers can be found here. If you review any of these papers for relevance, you should edit the curated papers file here; these curations are taken into account when retraining the ranking model.

New entries for 2024-09-01 to 2024-10-01:

PubMed ID Title
39229008 Creating and leveraging bespoke large-scale knowledge graphs for comparative genomics and multi-omics drug discovery with SocialGene.
39294369 Genomics 2 Proteins portal: a resource and discovery tool for linking genetic screening outputs to protein sequences and structures.
39227781 Variant graph craft (VGC): a comprehensive tool for analyzing genetic variation and identifying disease-causing variants.
39241109 Interactive tools for functional annotation of bacterial genomes.
39230707 GORetriever: reranking protein-description-based GO candidates by literature-driven deep information retrieval for protein function annotation.
39237205 CAPRI-Q: The CAPRI resource evaluating the quality of predicted structures of protein complexes.
39339901 Bioinformatics Goes Viral: I. Databases, Phylogenetics and Phylodynamics Tools for Boosting Virus Research.
39268315 OnetoMap Meta-Data: Healthcare Analytics Through Research.
39345624 Saccharomyces Genome Database: Advances in Genome Annotation, Expanded Biochemical Pathways, and Other Key Enhancements.
39212696 Toward integration of glycan chemical databases: an algorithm and software tool for extracting sugars from chemical structures.
39228707 Leveraging Generative AI to Accelerate Biocuration of Medical Actions for Rare Disease.
39235746 A Comprehensive Guide to Quality Assessment and Data Submission for Genomic Surveillance of Enteric Pathogens.
39215721 AnnoDUF: A Web-Based Tool for Annotating Functions of Proteins Having Domains of Unknown Function.
39282297 Genome-Wide Mapping of RNA-Protein Associations via Sequencing.
39341994 DRCTdb: disease-related cell type analysis to decode cell type effect and underlying regulatory mechanisms.
39201310 Community Resource: Large-Scale Proteogenomics to Refine Wheat Genome Annotations.
39341795 SCancerRNA: Expression at the Single-cell Level and Interaction Resource of Non-coding RNA Biomarkers for Cancers.
39279874 BioTextQuest v2.0: An evolved tool for biomedical literature mining and concept discovery.
39233898 Improving protein function prediction by learning and integrating representations of protein sequences and function labels.
39224843 Asteraceae genome database: a comprehensive platform for Asteraceae genomics.

cthoyt added a commit that referenced this issue Oct 19, 2024
Updated curated papers list with all papers identified from Aug 9th in
#1165.
Curated new provider for PDB Structure called 'furna'. PMID:
[39074139](https://bioregistry.io/pubmed:39074139)

---------

Co-authored-by: Mufaddal Naguthanawala <[email protected]>
Co-authored-by: Charles Tapley Hoyt <[email protected]>
Co-authored-by: Benjamin M. Gyori <[email protected]>
cthoyt added a commit that referenced this issue Oct 25, 2024
This pull request updates the `curated_papers.tsv` file with all PubMed
papers identified till 2024-10-01 in
#1165.

Here are some statistics about the classification of each paper based on
relevancy_type so far:

**Relevant (1) classifications: 14** 
- new_prefix: 3
- new_provider: 4
- new_publication: 2
- unclear: 2**
- existing: 3

**Irrelevant (0) classifications: 40** 
- irrelevant_other: 31
- no_website: 2
- not_identifiers_resource: 7

** 1. [39104285](https://bioregistry.io/pubmed:39104285) is a provider
for UniProt IDs but was not curated due to the variable nature of the
`uri_format`. 2. [38991851](https://bioregistry.io/pubmed:38991851) was
curated as a prefix but there was some discussion about whether it
should be a provider instead. See
#1194. Regardless, both
of these were curated as relevant (1) which seems to be the more
important classification.

---------

Co-authored-by: Mufaddal Naguthanawala <[email protected]>
Co-authored-by: Charles Tapley Hoyt <[email protected]>
Copy link
Contributor Author

github-actions bot commented Dec 1, 2024

This issue contains monthly updates to an automatically ranked list of PubMed papers as candidates for curation in the Bioregistry. Papers may be relevant in at least three ways:
(1) as a new prefix for a resource that can be added to the Bioregistry,
(2) as a provider for an existing prefix, or
(3) as a new publication for an existing prefix already in the Bioregistry.

These curations can happen in separate issues and pull requests. The full list of ranked papers can be found here. If you review any of these papers for relevance, you should edit the curated papers file here; these curations are taken into account when retraining the ranking model.

New entries for 2024-11-01 to 2024-12-01:

PubMed ID Title
39526381 NCBI RefSeq: reference sequence standards through 25 years of curation and annotation.
39576581 DescribePROT Database of Residue-Level Protein Structure and Function Annotations.
39552041 UniProt: the Universal Protein Knowledgebase in 2025.
39530598 Saccharomyces Genome Database: Advances in Genome Annotation, Expanded Biochemical Pathways, and Other Key Enhancements.
39558178 NASA open science data repository: open science for life in space.
39607847 The text2term tool to map free-text descriptions of biomedical terms to ontologies.
39558185 Plant Metabolic Network 16: expansion of underrepresented plant groups and experimentally supported enzyme data.
39574417 BindingDB in 2024: a FAIR knowledgebase of protein-small molecule binding data.
39498494 StreptomeDB 4.0: a comprehensive database of streptomycetes natural products enriched with protein interactions and interactive spectral visualization.
39540856 MATEdb2, a Collection of High-Quality Metazoan Proteomes across the Animal Tree of Life to Speed Up Phylogenomic Studies.
39535874 SoyOD: An Integrated Soybean Multi-omics Database for Mining Genes and Biological Research.
39480818 Reference Sequence Browser: An R application with a user-friendly GUI to rapidly query sequence databases.
39546404 The Genomic SSR Millets Database (GSMDB): enhancing genetic resources for sustainable agriculture.
39526195 Taxonomy Identifiers (TaxId) for Biodiversity Genomics: a guide to getting TaxId for submission of data to public databases.
39616207 A user-friendly NoSQL framework for managing agricultural field trial data.
39565202 InterPro: the protein sequence classification resource in 2025.
39498478 Genomes OnLine Database (GOLD) v.10: new features and updates.
39493756 Idbview: a database and interactive platform for respiratory-associated disease.
39526373 Database resources of the National Center for Biotechnology Information in 2025.
39530242 MolluscDB 2.0: a comprehensive functional and evolutionary genomics database for over 1400 molluscan species.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

0 participants