TCMLLM-PR:中医处方推荐大模型评价

TCMLLM-PR: evaluation of large language models for prescription recommendation in traditional Chinese medicine

  • 摘要:
    目的 构建并评估一个面向中医(TCM)处方推荐的微调大语言模型(LLM),命名为TCMLLM-PR。
    方法 首先,我们通过整合来自八个来源的数据构建了一个包含68654个样本(约1000万个令牌)的指令微调数据集,包括四本中医教材、《中华人民共和国药典》(2020年版)(CHP)、中医临床病例(CMCC)以及涵盖肺病、肝病、中风、糖尿病和脾胃病的医院临床记录。然后,我们使用ChatGLM-6B和P-Tuning v2技术微调TCMLLM-PR。评估包括三个方面:(1)与传统处方推荐模型(PTM、TCMPR、PresRecST)的比较;(2)与中药特异性LLM(神农、华佗、华佗GPT)和通用领域ChatGPT的比较;(3)评估不同疾病数据集之间的模型迁移能力。此外,我们采用了在处方推荐任务中常用的精确度、召回率和F1分数作为评估指标。
    结果 实验表明TCMLLM-PR在中医教材和CHP数据集上的表现明显优于基线模型,F1@10提升分别为31.80%和59.48%。在跨数据集验证中,该模型在从中医教材迁移到肝病数据集时表现最佳,F1@10为0.155 1。对实际案例的分析表明,TCMLLM-PR的处方建议与实际医生处方最为匹配。
    结论 本研究将LLMs整合到中医处方推荐中,利用量身定制的指令微调数据集并开发了TCMLLM-PR。同时,本研究将公开TCMLLM-PR的最佳模型参数,促进中医临床决策支持的发展(https://github.com/2020MEAI/TCMLLM)。

     

    Abstract:
    Objective To develop and evaluate a fine-tuned large language model (LLM) for traditional Chinese medicine (TCM) prescription recommendation named TCMLLM-PR.
    Methods First, we constructed an instruction-tuning dataset containing 68654 samples (approximately 10 million tokens) by integrating data from eight sources, including four TCM textbooks, Pharmacopoeia of the People’s Republic of China 2020 (CHP), Chinese Medicine Clinical Cases (CMCC), and hospital clinical records covering lung disease, liver disease, stroke, diabetes, and splenic-stomach disease. Then, we trained TCMLLM-PR using ChatGLM-6B with P-Tuning v2 technology. The evaluation consisted of three aspects: (i) comparison with traditional prescription recommendation models (PTM, TCMPR, and PresRecST); (ii) comparison with TCM-specific LLMs (ShenNong, Huatuo, and HuatuoGPT) and general-domain ChatGPT; (iii) assessment of model migration capability across different disease datasets. We employed precision, recall, and F1 score as evaluation metrics.
    Results The experiments showed that TCMLLM-PR significantly outperformed baseline models on TCM textbooks and CHP datasets, with F1@10 improvements of 31.80% and 59.48%, respectively. In cross-dataset validation, the model performed best when migrating from TCM textbooks to liver disease dataset, achieving an F1@10 of 0.155 1. Analysis of real-world cases demonstrated that TCMLLM-PR's prescription recommendations most closely matched actual doctors’ prescriptions.
    Conclusion This study integrated LLMs into TCM prescription recommendations, leveraging a tailored instruction-tuning dataset and developing TCMLLM-PR. This study will publicly release the best model parameters of TCMLLM-PR to promote the development of the decision-making process in TCM practices (https://github.com/2020MEAI/TCMLLM).

     

/

返回文章
返回