BioMedical Engineering OnLine (Jun 2025)

Research on automatic assessment of the severity of unilateral vocal cord paralysis based on Mel-spectrogram and convolutional neural networks

  • Shuaichi Ma,
  • Wenwen Liao,
  • Yi Zhang,
  • Fan Zhang,
  • Yimiao Wang,
  • Zhiyan Lu,
  • Chen Zhao,
  • Jianbo Yu,
  • Peijie He

DOI
https://doi.org/10.1186/s12938-025-01401-9
Journal volume & issue
Vol. 24, no. 1
pp. 1 – 19

Abstract

Read online

Abstract Background This study aims to develop an AI-powered platform using Mel-spectrogram analysis and convolutional neural networks (CNN) to automate the severity assessment of unilateral vocal fold paralysis (UVCP) through voice analysis, providing an objective basis for individualized clinical treatment plans. Methods To accurately identify the severity of UVCP, this study developed the CNN model TripleConvNet. Voice samples were collected from 131 healthy individuals and 292 confirmed UVCP patients from the Eye and ENT Hospital of Fudan University. Based on vocal fold compensation function, the patients were divided into three groups: decompensated (84 cases), partially compensated (98 cases), and fully compensated (110 cases). Using Mel-spectrograms and their first- and second-order differential features as inputs, the TripleConvNet model classified patients by severity and was systematically evaluated for its performance in UVCP severity grading tasks. Results TripleConvNet achieved a classification accuracy of 74.3% in distinguishing between healthy voices and the UVCP decompensated, partially compensated, and fully compensated groups. Conclusion This study demonstrates the potential of deep learning-based non-invasive voice analysis for precise grading of UVCP severity. The proposed method offers a promising clinical tool to assist physicians in disease assessment and personalized treatment planning.

Keywords