基于知识图谱增强的中医证候诊断长尾学习方法

Knowledge graph-enhanced long-tail learning approach for traditional Chinese medicine syndrome differentiation

  • 摘要:
    目的 针对真实临床环境中中医证候诊断面临的长尾分布与特征稀疏性双重挑战,本研究提出一种知识图谱增强的数据高效学习框架。
    方法 研究开发了Agent-GNN三阶段解耦学习框架,并在包含54 152条临床记录、涵盖148个证候类别的TCM-SD数据集上进行验证。首先,构建编码完整中医推理体系的全景医学知识图谱。其次,提出功能性患者画像(FPP)方法,利用大语言模型结合图检索增强生成技术从病历中提取结构化的症状−病因−病机子图。最后,采用异构图神经网络显式学习结构化组合模式。研究将Agent-GNN与多种基线模型进行对比,包括BERT、ZY-BERT、ZY-BERT + Know、GAT和GPT-4少样本学习,采用宏平均F1值作为主要评价指标。此外,通过消融实验验证各关键模块对模型性能的贡献。
    结果 Agent-GNN实现了72.4%的整体宏平均F1值,较表现最优的传统方法ZY-BERT + Know(63.7%)提升8.7个百分点。对于样本量少于10的长尾证候,Agent-GNN的宏平均F1值达到58.6%,而ZY-BERT + Know和GPT-4少样本学习分别为39.3%和41.2%,相对提升幅度分别达49.2%和42.2%。消融实验证实,病因病机节点的显式建模为长尾证候性能提升贡献了12.4个百分点。
    结论 本研究提出的Agent-GNN知识图谱增强框架有效解决了中医证候诊断中的长尾分布难题。通过结构化知识图谱显式建模表象−机理−本质模式,该方法在数据稀缺场景下表现出更出色的性能,为中医智能诊断提供了可解释的推理路径。

     

    Abstract:
    Objective To address the dual challenges of long-tail distribution and feature sparsity in traditional Chinese medicine (TCM) syndrome differentiation within real clinical settings, we propose a data-efficient learning framework enhanced by knowledge graphs.
    Methods We developed Agent-GNN, a three-stage decoupled learning framework, and validated it on the Traditional Chinese Medicine Syndrome Diagnosis (TCM-SD) dataset containing 54 152 clinical records across 148 syndrome categories. First, we constructed a comprehensive medical knowledge graph encoding the complete TCM reasoning system. Second, we proposed a Functional Patient Profiling (FPP) method that utilizes large language models (LLMs) combined with Graph Retrieval-Augmented Generation (RAG) to extract structured symptom-etiology-pathogenesis subgraphs from medical records. Third, we employed heterogeneous graph neural networks to learn structured combination patterns explicitly. We compared our method against multiple baselines including BERT, ZY-BERT, ZY-BERT + Know, GAT, and GPT-4 Few-shot, using macro-F1 score as the primary evaluation metric. Additionally, ablation experiments were conducted to validate the contribution of each key component to model performance.
    Results Agent-GNN achieved an overall macro-F1 score of 72.4%, representing an 8.7 percentage points improvement over ZY-BERT + Know (63.7%), the strongest baseline among traditional methods. For long-tail syndromes with fewer than 10 samples, Agent-GNN reached a macro-F1 score of 58.6%, compared with 39.3% for ZY-BERT + Know and 41.2% for GPT-4 Few-shot, representing relative improvements of 49.2% and 42.2%, respectively. Ablation experiments confirmed that the explicit modeling of etiology-pathogenesis nodes contributed 12.4 percentage points to this enhanced long-tail syndrome performance.
    Conclusion This study proposes Agent-GNN, a knowledge graph-enhanced framework that effectively addresses the long-tail distribution challenge in TCM syndrome differentiation. By explicitly modeling manifestation-mechanism-essence patterns through structured knowledge graphs, our approach achieves superior performance in data-scarce scenarios while providing interpretable reasoning paths for TCM intelligent diagnosis.

     

/

返回文章
返回