Fine-Med-Mental-T&amp;P：一种构建高质量中医神志病指令数据集的双轨方法

魏彦柏; 荆晓朔; 晏峻峰

doi:10.1016/j.dcmed.2026.02.004

Fine-Med-Mental-T&P：一种构建高质量中医神志病指令数据集的双轨方法

Fine-Med-Mental-T&P: a dual-track approach for high-quality instructional datasets of mental disorders in traditional Chinese medicine

摘要

摘要:
目的探究构建中医神志病高质量指令数据集的方法，并验证其有效性。
方法我们提出了Fine-Med-Mental-T&P这一方法论，用于构建中医神志病学领域的高质量指令数据集。该方法通过双轨策略将理论知识与实际案例研究相结合。（1）理论轨道：对中医神志病的教材和指南进行了人工分割。使用DeepSeek-V3生成初始响应，然后通过Qwen3-32B 模型进行优化，以使表达符合人类偏好。随后应用筛选算法，筛选出16 000对高质量的指令对。（2）实践轨道：从超过600个真实的临床病例种子开始，使用 DeepSeek-V3 生成诊断和治疗的指令对，随后通过人工评估进行筛选，最终得到4 000对高质量的实践导向的指令对。两个轨道的整合形成了Med-Mental-Instruct-T&P数据集，总共有20 000对指令。为了验证数据集的有效性，我们进行了3项实验评估（包括手动评估和自动化评估）：（1）对比研究，以比较在不同数据集上微调后的模型的性能；（2）基准测试，与主流中医大语言模型进行比较；（3）数据消融研究，以探究数据量与模型性能之间的关系。
结果实验结果表明，基于Med-Mental-Instruct-T&P数据集进行微调的 T&P-model 具有卓越的性能。在对比研究中，T&P-model明显优于仅基于自动生成或纯粹人工筛选的基准数据训练的基线模型。这种优越性在自动指标（ROUGE-L > 0.55）和专家人工评估（准确率评分超过 7）中均有所体现。在基准比较中，T&P-model 也优于现有的主流中医大型语言模型（例如 HuatuoGPT 和 ZuoyiGPT）。它在处理各种临床表现方面表现出特别强大的能力，包括诸如失眠和昏迷等具有挑战性的病症，展示了强大的全面综合竞争力。数据消融研究显示，当训练数据从 10%增加到 50%时，T&P-model的性能呈现出总体上升的趋势，虽然存在一些小幅波动；超过 50%后，性能提升的速度显著放缓，各项指标趋于稳定并接近饱和点。
结论本研究成功构建了针对中医神志病的专业化Med-Mental-Instruct-T&P指令数据集，并提出了Fine-Med-Mental-T&P方法，有效地解决了中医领域中高质量、特定领域数据稀缺这一关键难题，为开发智能中医诊断和治疗系统提供了必要的数据支持。

Abstract:
Objective To investigate methods for constructing a high-quality instructional dataset for traditional Chinese medicine (TCM) mental disorders and to validate its efficacy.
Methods We proposed the Fine-Med-Mental-T&P methodology for constructing high-quality instruction datasets in TCM mental disorders. This approach integrates theoretical knowledge and practical case studies through a dual-track strategy. (i) Theoretical track: textbooks and guidelines on TCM mental disorders were manually segmented. Initial responses were generated using DeepSeek-V3, followed by refinement by the Qwen3-32B model to align the expression with human preferences. A screening algorithm was then applied to select 16 000 high-quality instruction pairs. (ii) Practical track: starting from over 600 real clinical case seeds, diagnostic and therapeutic instruction pairs were generated using DeepSeek-V3 and subsequently screened through manual evaluation, resulting in 4 000 high-quality practice-oriented instruction pairs. The integration of both tracks yielded the Med-Mental-Instruct-T&P dataset, comprising a total of 20 000 instruction pairs. To validate the dataset’s effectiveness, three experimental evaluations (both manual and automated) were conducted: (i) comparative studies to compare the performance of models fine-tuned on different datasets; (ii) benchmarking to compare against mainstream TCM-specific large language models (LLMs); (iii) data ablation study to investigate the relationship between data volume and model performance.
Results Experimental results demonstrate the superior performance of T&P-model fine-tuned on the Med-Mental-Instruct-T&P dataset. In the comparative study, the T&P-model significantly outperformed the baseline models trained solely on self-generated or purely human-curated baseline data. This superiority was evident in both automated metrics (ROUGE-L > 0.55) and expert manual evaluations (scoring above 7/10 across accuracy). In benchmark comparisons, the T&P-model also excelled against existing mainstream TCM LLMs (e.g., HuatuoGPT and ZuoyiGPT). It showed particularly strong capabilities in handling diverse clinical presentations, including challenging disorders such as insomnia and coma, showcasing its robustness and versatility. Data ablation studies showed that T&P-model performance had an overall upward trend with minor fluctuations when training data increased from 10% to 50%; beyond 50%, performance improvement slowed significantly, with metrics plateauing and approaching a saturation point.
Conclusion This study has successfully constructed the specialized Med-Mental-Instruct-T&P instruction dataset for TCM mental disorders proposed the systematic Fine-Med-Mental-T&P methodology for its development, effectively addressing the critical challenge of high-quality, domain-specific data scarcity in TCM, and providing essential data support for developing intelligent TCM diagnostic and therapeutic systems.

HTML全文

参考文献(29)

施引文献

资源附件(0)