Objective To develop and evaluate a fine-tuned large language model (LLM) for traditional Chinese medicine (TCM) prescription recommendation named TCMLLM-PR.
Methods First, we constructed an instruction-tuning dataset containing 68654 samples (approximately 10 million tokens) by integrating data from eight sources, including four TCM textbooks, Pharmacopoeia of the People’s Republic of China 2020 (CHP), Chinese Medicine Clinical Cases (CMCC), and hospital clinical records covering lung disease, liver disease, stroke, diabetes, and splenic-stomach disease. Then, we trained TCMLLM-PR using ChatGLM-6B with P-Tuning v2 technology. The evaluation consisted of three aspects: (i) comparison with traditional prescription recommendation models (PTM, TCMPR, and PresRecST); (ii) comparison with TCM-specific LLMs (ShenNong, Huatuo, and HuatuoGPT) and general-domain ChatGPT; (iii) assessment of model migration capability across different disease datasets. We employed precision, recall, and F1 score as evaluation metrics.
Results The experiments showed that TCMLLM-PR significantly outperformed baseline models on TCM textbooks and CHP datasets, with F1@10 improvements of 31.80% and 59.48%, respectively. In cross-dataset validation, the model performed best when migrating from TCM textbooks to liver disease dataset, achieving an F1@10 of 0.155 1. Analysis of real-world cases demonstrated that TCMLLM-PR's prescription recommendations most closely matched actual doctors’ prescriptions.
Conclusion This study integrated LLMs into TCM prescription recommendations, leveraging a tailored instruction-tuning dataset and developing TCMLLM-PR. This study will publicly release the best model parameters of TCMLLM-PR to promote the development of the decision-making process in TCM practices (
https://github.com/2020MEAI/TCMLLM).