基于面部望诊图像特征的肺癌风险预警模型研究

石玉琳; 张疏逸; 刘嘉懿; 陈文连; 刘苓霜; 许玲; 许家佗

doi:10.1016/j.dcmed.2025.09.007

基于面部望诊图像特征的肺癌风险预警模型研究

A lung cancer early-warning risk model based on facial diagnosis image features

摘要

摘要:
目的本研究旨在探索基于面部图像特征构建肺癌风险预警模型的可行性，探讨肺癌早期筛查的新方法。
方法研究纳入2019年11月1日至2024年12月31日在上海中医药大学附属曙光医院体检中心确诊的肺结节患者，以及同期在上海中医药大学附属岳阳中西医结合医院、龙华医院肿瘤科确诊的肺癌患者，采用 TFDA-1 型舌面诊仪收集肺结节与肺癌患者的面部图像信息，运用深度学习技术提取面诊特征。对两组人群的面部图像特征进行统计分析，分析其面部图像特征差异，并借助最小绝对收缩和选择算子（LASSO）回归对特征变量进行筛选。基于筛选后的特征变量，分别运用随机森林、逻辑回归、支持向量机（SVM）及随机梯度提升树（GBDT）这 4 种机器学习方法建立肺癌分类模型。同时以灵敏度、特异度、F1-score、精度、准确率、受试者工作特征曲线（ROC）下面积（ AUC）以及精确率-召回率曲线（PR）下面积（AP）等指标对模型性能进行评估。
结果本研究共纳入1 275例肺结节患者及1 623例肺癌患者，经倾向性评分匹配（PSM）校正性别与年龄后，最终纳入肺结节组与肺癌组各535例。肺结节与肺癌两组人群在多个面部区域的颜色空间指标（如 R、G、B、V、L、a、b、Cr、H、Y、Cb）和纹理指标（如 GLCM-CON，GLCM-IDM）上均表现出显著差异（P < 0.05）。为构建分类模型，采用 LASSO 回归从初始的 136 个面部特征中筛选出 63 个关键特征。基于此特征集建模，经十折分层交叉验证，SVM模型展现出最优性能。该模型在内部测试集上的平均 AUC 达 0.8729，平均准确率为 0.7990。在独立测试集上进一步验证，模型性能保持稳健（AUC = 0.8233, 准确率 = 0.7290），证实其良好的泛化能力。特征重要性分析显示，颜色空间指标（如额头 B 通道及口唇/整体 Cr 指标贡献最大）是模型分类决策的核心因素，而纹理指标（GLCM-ASM_2、GLCM-IDM_1、GLCM-CON_1、GLCM-ENT_2）起到重要的辅助作用。
结论肺癌与肺结节患者的面部图像特征在多个区域的颜色及纹理特征上存在显著差异，基于面部特征构建的各模型均展现出良好的效能，表明面部图像特征可作为肺癌风险预警的潜在生物标志物，为肺癌早期筛查提供无创、可行的新途径。

Abstract:
Objective To explore the feasibility of constructing a lung cancer early-warning risk model based on facial image features, providing novel insights into the early screening of lung cancer.
Methods This study included patients with pulmonary nodules diagnosed at the Physical Examination Center of Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine from November 1, 2019 to December 31, 2024, as well as patients with lung cancer diagnosed in the Oncology Departments of Yueyang Hospital of Integrated Traditional Chinese and Western Medicine and Longhua Hospital during the same period. The facial image information of patients with pulmonary nodules and lung cancer was collected using the TFDA-1 tongue and facial diagnosis instrument, and the facial diagnosis features were extracted from it by deep learning technology. Statistical analysis was conducted on the objective facial diagnosis characteristics of the two groups of participants to explore the differences in their facial image characteristics, and the least absolute shrinkage and selection operator (LASSO) regression was used to screen the characteristic variables. Based on the screened feature variables, four machine learning methods: random forest, logistic regression, support vector machine (SVM), and gradient boosting decision tree (GBDT) were used to establish lung cancer classification models independently. Meanwhile, the model performance was evaluated by indicators such as sensitivity, specificity, F1 score, precision, accuracy, the area under the receiver operating characteristic (ROC) curve (AUC), and the area under the precision-recall curve (AP).
Results A total of 1 275 patients with pulmonary nodules and 1 623 patients with lung cancer were included in this study. After propensity score matching (PSM) to adjust for gender and age, 535 patients were finally included in the pulmonary nodule group and the lung cancer group, respectively. There were significant differences in multiple color space metrics (such as R, G, B, V, L, a, b, Cr, H, Y, and Cb) and texture metrics such as gray-levcl co-occurrence matrix (GLCM)-contrast (CON) and GLCM-inverse different moment (IDM) between the two groups of individuals with pulmonary nodules and lung cancer (P < 0.05). To construct a classification model, LASSO regression was used to select 63 key features from the initial 136 facial features. Based on this feature set, the SVM model demonstrated the best performance after 10-fold stratified cross-validation. The model achieved an average AUC of 0.8729 and average accuracy of 0.799 0 on the internal test set. Further validation on an independent test set confirmed the model’s robust performance (AUC = 0.823 3, accuracy = 0.729 0), indicating its good generalization ability. Feature importance analysis demonstrated that color space indicators and the whole/lip Cr components (including color-B-0, wholecolor-Cr, and lipcolor-Cr) were the core factors in the model’s classification decisions, while texture indicators GLCM-angular second moment (ASM)_2, GLCM-IDM_1, GLCM-CON_1, GLCM-entropy (ENT)_2 played an important auxiliary role.
Conclusion The facial image features of patients with lung cancer and pulmonary nodules show significant differences in color and texture characteristics in multiple areas. The various models constructed based on facial image features all demonstrate good performance, indicating that facial image features can serve as potential biomarkers for lung cancer risk prediction, providing a non-invasive and feasible new approach for early lung cancer screening.

HTML全文

参考文献(25)

施引文献

资源附件(0)