BioInstruct

🔬 Exciting breakthrough in BioNLP! 🧬

We're thrilled to introduce BioInstruct—a dataset enhancing LLMs like Llama with 25,000+ tailored instructions for biomedical tasks. Our research shows remarkable gains in question answering (QA), information extraction (IE), and text generation.

🌟 Highlights:

17.3% boost in QA accuracy
5.7% increase in IE F1 score
96% improvement in text generation tasks

By marrying instruction tuning with multi-task learning, our results also show that the performance gain is significantly higher when the LLM is instruction fine-tuned on closely related tasks.

For more details, please check out our paper.

Dataset

The BioInstruct dataset is available through huggingface dataset.

Citation Information

@article{Tran2024Bioinstruct,
    author = {Tran, Hieu and Yang, Zhichao and Yao, Zonghai and Yu, Hong},
    title = "{BioInstruct: instruction tuning of large language models for biomedical natural language processing}",
    journal = {Journal of the American Medical Informatics Association},
    pages = {ocae122},
    year = {2024},
    month = {06},
    issn = {1527-974X},
    doi = {10.1093/jamia/ocae122},
    url = {https://doi.org/10.1093/jamia/ocae122},
    eprint = {https://academic.oup.com/jamia/advance-article-pdf/doi/10.1093/jamia/ocae122/58084577/ocae122.pdf},
}

Contribute

Have a specific task and instruction you'd like an LLM to perform in a clinical setting? Raise a new issue here! Your contributions will aid in refining LLMs to be more effective and relevant in healthcare environments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

BioInstruct

Dataset

Citation Information

Contribute

Files

README.md

Latest commit

History

README.md

File metadata and controls

BioInstruct

Dataset

Citation Information

Contribute