基于图卷积网络的《伤寒论》异质图构建及节点表示学习方法

Heterogeneous graph construction and node representation learning method of Treatise on Febrile Diseases based on graph convolutional network

  • 摘要:
    目的基于图卷积神经网络,构建《伤寒论》“症状-方剂-中药”异质图并探寻节点向量表示的最优学习方法。
    方法从《伤寒论》含处方的条文中提取出症状、方剂、中药信息,构建“症状-方剂-中药”异质图,基于图卷积网络提出一种“症状-方剂-中药”异质图节点表示学习方法—中医图卷积网络(TCM-GCN),利用TCM-GCN分别对症状-方剂、症状-中药、方剂-中药异质图进行学习,基于消息传递和邻居聚合进行高阶传播得到节点的表示特征向量,获得症状、方剂、中药三类节点表示集合,为下游诊断预测模型任务的顺利开展提供基础。
    结果通过多热编码、非融合编码、融合编码三种节点表示方式在模型预测实验中对比发现,融合编码方式获得了相对较高的精准率、召回率和F1-score值,其Precision@10、Recall@10和F1-score@10值较非融合编码分别提升了9.77%、6.65%和8.30%。
    结论融合编码方式生成的节点表示在实验中取得了较好效果,表明《伤寒论》异质图节点表示TCM-GCN方法的有效性,也将提升其在下游诊断预测任务上的性能。

     

    Abstract:
    ObjectiveTo construct symptom-formula-herb heterogeneous graphs structured Treatise on Febrile Diseases (Shang Han Lun,《伤寒论》) dataset and explore an optimal learning method represented with node attributes based on graph convolutional network (GCN).
    MethodsClauses that contain symptoms, formulas, and herbs were abstracted from Treatise on Febrile Diseases to construct symptom-formula-herb heterogeneous graphs, which were used to propose a node representation learning method based on GCN − the Traditional Chinese Medicine Graph Convolution Network (TCM-GCN). The symptom-formula, symptom-herb, and formula-herb heterogeneous graphs were processed with the TCM-GCN to realize high-order propagating message passing and neighbor aggregation to obtain new node representation attributes, and thus acquiring the nodes’ sum-aggregations of symptoms, formulas, and herbs to lay a foundation for the downstream tasks of the prediction models.
    ResultsComparisons among the node representations with multi-hot encoding, non-fusion encoding, and fusion encoding showed that the Precision@10, Recall@10, and F1-score@10 of the fusion encoding were 9.77%, 6.65%, and 8.30%, respectively, higher than those of the non-fusion encoding in the prediction studies of the model.
    ConclusionNode representations by fusion encoding achieved comparatively ideal results, indicating the TCM-GCN is effective in realizing node-level representations of heterogeneous graph structured Treatise on Febrile Diseases dataset and is able to elevate the performance of the downstream tasks of the diagnosis model.

     

/

返回文章
返回