PubLabeler

Introduction

PubLabeler is a tool for automatically labeling scientific papers based on their abstracts and titles. It is designed to help researchers to quickly and accurately classify papers into different categories according to their research topics related to proteins. The tool uses a machine learning algorithm to train a model based on a large dataset of labeled papers, and then predicts the category of new papers based on their abstracts and titles.

Usage

To use PubLabeler, you need to follow these steps:

Install the required packages:

pip install pandas
pip install numpy   
pip install torch==1.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
pip install transformers

Prepare the dataset:

The data directory must contains the test.csv file. The test.csv file should contain the following columns:

pid: the UniProtKB ID of the protein
pmid: the PubMed ID of the paper
labels: optional, the manually labeled categories of the paper

The data directory should also contain the abstracts and titles of the papers in the following format:

paper_dict.pkl: a dictionary containing the abstracts and titles of the papers, i.e., {pmid: {'title': title, 'abstract': abstract}}
pro_name.pkl: a dictionary containing the textual descriptinos of the proteins extracted from the UniProtKB database, i.e., {pid: name}

Run the code:

To train the model:

python main.py

To predict the categories of new protein-paper pairs (the trained model is available at https://drive.google.com/drive/folders/1quUJeCX1XMt_U-viCG4H6KDuW2tt4O9s?usp=sharing):

python main.py --model_type test --best_model_idx 3 --init_weight 0

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
src		src
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PubLabeler

Introduction

Usage

About

Releases

Packages

Languages

ZhuLab-Fudan/PubLabeler

Folders and files

Latest commit

History

Repository files navigation

PubLabeler

Introduction

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages