口腔医学研究 ›› 2025, Vol. 41 ›› Issue (1): 16-20.DOI: 10.13701/j.cnki.kqyxyj.2025.01.004

• 口腔黏膜病学研究 • 上一篇    下一篇

基于ViT-B深度学习模型的口腔良恶性病变图像分类研究

崔宇琛1, 谢元栋2, 吴聿淼1, 牛凌霄3, 常路广达1, 朱宪春1*   

  1. 1.吉林大学口腔医院正畸科 吉林 长春 130021;
    2.吉林大学牙发育与颌骨重塑吉林省重点实验室 吉林 长春 130021;
    3.吉林大学口腔医院颌面外科 吉林 长春 130021
  • 收稿日期:2024-08-13 出版日期:2025-01-28 发布日期:2025-01-24
  • 通讯作者: * 朱宪春,E-mail:zhuxc@jlu.edu.cn
  • 作者简介:崔宇琛(1999~ ),女,山西人,硕士在读,研究方向:口腔正畸治疗的生物力学机制及临床优化应用。
  • 基金资助:
    吉林省科技厅自然科学基金项目(编号:YDZJ202201ZYTS057)

Research on Classification of Benign and Malignant Oral Lesions Using ViT-B Deep Learning Model

CUI Yuchen1, XIE Yuandong2, WU Yumiao1, NIU Lingxiao3, CHANG Luguangda1, ZHU Xianchun1*   

  1. 1. Department of Orthodontics, Hospital of Stomatology, Jilin University, Changchun 130021, China;
    2. Jilin Provincial Key Laboratory of Tooth Development and Bone Remodeling, Changchun 130021, China;
    3. Department of Oral and Maxillofacial Surgery, School and Hospital of Stomatology, Jilin University, Changchun 130021, China
  • Received:2024-08-13 Online:2025-01-28 Published:2025-01-24

摘要: 目的:基于深度学习算法,对ViT-B模型检测口腔良性和恶性病变图像的性能进行分析,旨在为临床医生早期发现和准确诊断口腔癌提供有效工具。方法:使用包含口腔良性和恶性病变图像的公共数据集,对数据进行预处理和数据增强,按7∶2∶1的比例将数据随机划分为训练集、验证集和测试集。选取ViT-B、VGG16、ResNet101、DenseNet121和EfficientNetV2 5种深度学习模型,对模型进行训练和性能比较。通过外部数据对ViT-B模型的泛化能力进行评估,并基于注意力权重的可视化方法对ViT-B模型进行分析。结果:ViT-B在5种模型中分类性能最佳,受试者工作特征曲线下面积为0.9715,准确率为91.00%。该模型可以有效区分口腔良性和恶性病变图像,具有较强的泛化能力和临床实用性。结论:ViT-B模型在口腔良性和恶性病变图像识别中表现良好,可以为口腔癌的早期发现和准确诊断提供支持。

关键词: 口腔癌, 口腔病变, 深度学习, ViT-B

Abstract: Objective: To analyze the performance of ViT-B model in detecting oral benign and malignant lesions based on deep learning algorithms. Methods: A public dataset containing images of oral benign and malignant lesions was used, with preprocessing and data augmentation applied. The data was randomly divided into training, validation, and test sets in a 7∶2∶1 ratio. Five deep learning models, including ViT-B, VGG16, ResNet101, DenseNet121, and EfficientNetV2, were selected for training and evaluation. The generalization ability of the ViT-B model was evaluated using external data, and the model was analyzed based on the visualization of attention weights. Results: The ViT-B model demonstrated the best performance among five models, with an area under the receiver operating characteristic curve (AUC) of 0.9715 and an accuracy of 91.00%. The model effectively distinguished between images of oral benign and malignant lesions, demonstrating strong generalization ability and clinical applicability. Conclusion: The ViT-B model performs well in the recognition of oral benign and malignant lesions, supporting the early detection and accurate diagnosis of oral cancer.

Key words: oral cancer, oral lesions, deep learning, ViT-B