Abstract:
Objective To develop a model based on a graph convolutional network (GCN) to achieve efficient classification of the cold and hot medicinal properties of Chinese herbal medicines (CHMs).
Methods After screening the dataset provided in the published literature, this study included 495 CHMs and their 8 075 compounds. Three molecular descriptors were used to represent the compounds: the molecular access system (MACCS), extended connectivity fingerprint (ECFP), and two-dimensional (2D) molecular descriptors computed by the RDKit open-source toolkit (RDKit_2D). A homogeneous graph with CHMs as nodes was constructed and a classification model for the cold and hot medicinal properties of CHMs was developed based on a GCN using the molecular descriptor information of the compounds as node features. Finally, using accuracy and F1 score to evaluate model performance, the GCN model was experimentally compared with the traditional machine learning approaches, including decision tree (DT), random forest (RF), k-nearest neighbor (KNN), Naïve Bayes classifier (NBC), and support vector machine (SVM). MACCS, ECFP, and RDKit_2D molecular descriptors were also adopted as features for comparison.
Results The experimental results show that the GCN achieved better performance than the traditional machine learning approach when using MACCS as features, with the accuracy and F1 score reaching 0.836 4 and 0.845 3, respectively. The accuracy and F1 score have increased by 0.8690 and 0.8120, respectively, compared with the lowest performing feature combination OMER (only the combination of MACCS, ECFP, and RDKit_2D). The accuracy and F1 score of DT, RF, KNN, NBC, and SVM are 0.505 1 and 0.501 8, 0.616 2 and 0.601 5, 0.676 8 and 0.624 3, 0.616 2 and 0.607 1, 0.636 4 and 0.622 5, respectively.
Conclusion In this study, by introducing molecular descriptors as features, it is verified that molecular descriptors and fingerprints play a key role in classifying the cold and hot medicinal properties of CHMs. Meanwhile, excellent classification performance was achieved using the GCN model, providing an important algorithmic basis for the in-depth study of the “structure-property” relationship of CHMs.