Data and code for paper titled Fine-grained Category Discovery under Coarse-grained supervision with Hierarchical Weighted Self-contrastive Learning (EMNLP 2022 Long paper)
Fine-grained Category Discovery under Coarse-grained supervision (FCDC) aims to discover novel fine-grained categories automatically based on the coarse-grained labeled data which are easier and cheaper to obtain.
We performed experiments on three public datasets: clinc, wos and hwu64, which have been included in our repository in the data folder ' ./data '.
Our model mainly contains three components: BERT, Dynamic Queue and Momentum BERT.
- python==3.8
- pytorch==1.10.0
- transformers==4.19.2
- scipy==1.8.0
- numpy==1.21.6
- scikit-learn==1.1.1
Training and testing our model through the bash scripts:
sh scripts/run.sh
You can also add or change parameters in run.sh. (More parameters are listed in init_parameter.py)
It should be noted that the experimental results may be slightly different because of the randomness of clustering when testing.Some code references the following repositories:
If our paper or code is helpful to you, please consider citing our paper:
@inproceedings{an-etal-2022-fine,
title = "Fine-grained Category Discovery under Coarse-grained supervision with Hierarchical Weighted Self-contrastive Learning",
author = "An, Wenbin and
Tian, Feng and
Chen, Ping and
Tang, Siliang and
Zheng, Qinghua and
Wang, QianYing",
booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
year = "2022",
address = "Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
pages = "1314--1323",
}