Objective To investigate methods for constructing a high-quality instructional dataset for traditional Chinese medicine (TCM) mental disorders and to validate its efficacy.
Methods We proposed the Fine-Med-Mental-T&P methodology for constructing high-quality instruction datasets in TCM mental disorders. This approach integrates theoretical knowledge and practical case studies through a dual-track strategy. (i) Theoretical track: textbooks and guidelines on TCM mental disorders were manually segmented. Initial responses were generated using DeepSeek-V3, followed by refinement by the Qwen3-32B model to align the expression with human preferences. A screening algorithm was then applied to select 16 000 high-quality instruction pairs. (ii) Practical track: starting from over 600 real clinical case seeds, diagnostic and therapeutic instruction pairs were generated using DeepSeek-V3 and subsequently screened through manual evaluation, resulting in 4 000 high-quality practice-oriented instruction pairs. The integration of both tracks yielded the Med-Mental-Instruct-T&P dataset, comprising a total of 20 000 instruction pairs. To validate the dataset’s effectiveness, three experimental evaluations (both manual and automated) were conducted: (i) comparative studies to compare the performance of models fine-tuned on different datasets; (ii) benchmarking to compare against mainstream TCM-specific large language models (LLMs); (iii) data ablation study to investigate the relationship between data volume and model performance.
Results Experimental results demonstrate the superior performance of T&P-model fine-tuned on the Med-Mental-Instruct-T&P dataset. In the comparative study, the T&P-model significantly outperformed the baseline models trained solely on self-generated or purely human-curated baseline data. This superiority was evident in both automated metrics (ROUGE-L > 0.55) and expert manual evaluations (scoring above 7/10 across accuracy). In benchmark comparisons, the T&P-model also excelled against existing mainstream TCM LLMs (e.g., HuatuoGPT and ZuoyiGPT). It showed particularly strong capabilities in handling diverse clinical presentations, including challenging disorders such as insomnia and coma, showcasing its robustness and versatility. Data ablation studies showed that T&P-model performance had an overall upward trend with minor fluctuations when training data increased from 10% to 50%; beyond 50%, performance improvement slowed significantly, with metrics plateauing and approaching a saturation point.
Conclusion This study has successfully constructed the specialized Med-Mental-Instruct-T&P instruction dataset for TCM mental disorders proposed the systematic Fine-Med-Mental-T&P methodology for its development, effectively addressing the critical challenge of high-quality, domain-specific data scarcity in TCM, and providing essential data support for developing intelligent TCM diagnostic and therapeutic systems.