Objective To address the dual challenges of long-tail distribution and feature sparsity in traditional Chinese medicine (TCM) syndrome differentiation within real clinical settings, we propose a data-efficient learning framework enhanced by knowledge graphs.
Methods We developed Agent-GNN, a three-stage decoupled learning framework, and validated it on the Traditional Chinese Medicine Syndrome Diagnosis (TCM-SD) dataset containing 54 152 clinical records across 148 syndrome categories. First, we constructed a comprehensive medical knowledge graph encoding the complete TCM reasoning system. Second, we proposed a Functional Patient Profiling (FPP) method that utilizes large language models (LLMs) combined with Graph Retrieval-Augmented Generation (RAG) to extract structured symptom-etiology-pathogenesis subgraphs from medical records. Third, we employed heterogeneous graph neural networks to learn structured combination patterns explicitly. We compared our method against multiple baselines including BERT, ZY-BERT, ZY-BERT + Know, GAT, and GPT-4 Few-shot, using macro-F1 score as the primary evaluation metric. Additionally, ablation experiments were conducted to validate the contribution of each key component to model performance.
Results Agent-GNN achieved an overall macro-F1 score of 72.4%, representing an 8.7 percentage points improvement over ZY-BERT + Know (63.7%), the strongest baseline among traditional methods. For long-tail syndromes with fewer than 10 samples, Agent-GNN reached a macro-F1 score of 58.6%, compared with 39.3% for ZY-BERT + Know and 41.2% for GPT-4 Few-shot, representing relative improvements of 49.2% and 42.2%, respectively. Ablation experiments confirmed that the explicit modeling of etiology-pathogenesis nodes contributed 12.4 percentage points to this enhanced long-tail syndrome performance.
Conclusion This study proposes Agent-GNN, a knowledge graph-enhanced framework that effectively addresses the long-tail distribution challenge in TCM syndrome differentiation. By explicitly modeling manifestation-mechanism-essence patterns through structured knowledge graphs, our approach achieves superior performance in data-scarce scenarios while providing interpretable reasoning paths for TCM intelligent diagnosis.