OAG scholar profiling

Prerequisites

Linux
Python 3.7
PyTorch 1.10.0+cu111

Getting Started

Installation

Clone this repo.

git clone https://github.com/THUDM/scholar-profiling.git
cd scholar-profiling

Please install dependencies by

pip install -r requirements.txt

Dataset

The dataset can be downloaded from BaiduPan (with password 7lro) or Aliyun. There are three parts as follows:

data_ex.zip [Aliyun Link]: unzip the file and put the data directory into project directory.
pretrain_models.zip [Aliyun Link]: unzip the file and put the pretrain_models directory into project directory.
googleSearch: use 7z to extract data.zip in this folder and put the googleSearch directory in the data directory. [Aliyun Link1], [Aliyun Link2], [Aliyun Link3], [Aliyun Link4], [Aliyun Link5], [Aliyun Link6], [Aliyun Link7], [Aliyun Link8], [Aliyun Link9]

How to run

cd $project_path
export CUDA_VISIBLE_DEVICES='?'  # specify which GPU(s) to be used
export PYTHONPATH="`pwd`:$PYTHONPATH"

# Statistical machine learning (SML) methods:
# gender
python sml_baseline/GenderPredict/main.py
# homepage
python sml_baseline/HomepagePrediction/homepage_train.py
# position
python sml_baseline/TitlePrediction/title_main.py
# evaluation
python sml_baseline/merge_results.py
python evaluate.py --hp output/sml/sml_predict_xgboost.json --rf data/raw/ground_truth.json

# BERT
# First, uncomment three functions including create_gender_classification_data(), create_homepage_classification_data(), create_title_classification_data() to generate training data
python bert_baseline/tools.py 
# gender
python bert_baseline/gender_classification_bert.py
# homepage
python bert_baseline/homepage_classification_bert.py
# position
python bert_baseline/title_classification_bert.py
# for evaluation, uncomment merge_result() funciton in bert_baseline/tools.py 
python bert_baseline/tools.py 
python evaluate.py --hp data/luoyang-result_new.json --rf data/raw/ground_truth.json

# Bi-LSTM-CRF for position tagging
python data_process.py
python bert_bilstm_crf/run.py

# BERT with prompt tuning
# First, uncomment four functions including get_gender_data(r'data/raw/new_dev.xlsx'), get_title_data(r'data/raw/new_dev.xlsx'), get_gender_test(), and get_train_data() to generate training data
python data_process.py
# gender
python prompt/gender_prompt.py
# postion
python prompt/title_prompt.py

For how to extract more attributes from long texts of scholars' profiles, please see README.md in bio_models.

References

🌟 If you find our work helpful, please leave us a star and cite our paper.

@inproceedings{zhang2024oag,
  title={OAG-bench: a human-curated benchmark for academic graph mining},
  author={Fanjin Zhang and Shijie Shi and Yifan Zhu and Bo Chen and Yukuo Cen and Jifan Yu and Yelin Chen and Lulu Wang and Qingfei Zhao and Yuqing Cheng and Tianyi Han and Yuwei An and Dan Zhang and Weng Lam Tam and Kun Cao and Yunhe Pang and Xinyu Guan and Huihui Yuan and Jian Song and Xiaoyan Li and Yuxiao Dong and Jie Tang},
  booktitle={Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
  pages={6214--6225},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OAG scholar profiling

Prerequisites

Getting Started

Installation

Dataset

How to run

References

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
bert_baseline		bert_baseline
bert_bilstm_crf		bert_bilstm_crf
bio_models		bio_models
chatglm		chatglm
deberta		deberta
llama		llama
prompt		prompt
sml_baseline		sml_baseline
.gitignore		.gitignore
README.md		README.md
data_process.py		data_process.py
evaluate.py		evaluate.py
requirements.txt		requirements.txt
result.md		result.md
settings.py		settings.py

THUDM/scholar-profiling

Folders and files

Latest commit

History

Repository files navigation

OAG scholar profiling

Prerequisites

Getting Started

Installation

Dataset

How to run

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages