Research on automatic assessment of the severity of unilateral vocal cord paralysis based on Mel-spectrogram and convolutional neural networks

Shuaichi Ma; Wenwen Liao; Yi Zhang; Fan Zhang; Yimiao Wang; Zhiyan Lu; Chen Zhao; Jianbo Yu; Peijie He

doi:10.1186/s12938-025-01401-9

BioMedical Engineering OnLine (Jun 2025)

Research on automatic assessment of the severity of unilateral vocal cord paralysis based on Mel-spectrogram and convolutional neural networks

Shuaichi Ma,
Wenwen Liao,
Yi Zhang,
Fan Zhang,
Yimiao Wang,
Zhiyan Lu,
Chen Zhao,
Jianbo Yu,
Peijie He

Affiliations

Shuaichi Ma: School of Health Science and Engineering, University of Shanghai for Science and Technology
Wenwen Liao: Academy for Engineering &Technology, Fudan University
Yi Zhang: ENT Institute and Department of Otorhinolaryngology, Eye & ENT Hospital, Fudan University
Fan Zhang: ENT Institute and Department of Otorhinolaryngology, Eye & ENT Hospital, Fudan University
Yimiao Wang: ENT Institute and Department of Otorhinolaryngology, Eye & ENT Hospital, Fudan University
Zhiyan Lu: ENT Institute and Department of Otorhinolaryngology, Eye & ENT Hospital, Fudan University
Chen Zhao: ENT Institute and Department of Otorhinolaryngology, Eye & ENT Hospital, Fudan University
Jianbo Yu: School of Microelectronics, Fudan University
Peijie He: ENT Institute and Department of Otorhinolaryngology, Eye & ENT Hospital, Fudan University

DOI: https://doi.org/10.1186/s12938-025-01401-9
Journal volume & issue: Vol. 24, no. 1
pp. 1 – 19

Abstract

Read online

Abstract Background This study aims to develop an AI-powered platform using Mel-spectrogram analysis and convolutional neural networks (CNN) to automate the severity assessment of unilateral vocal fold paralysis (UVCP) through voice analysis, providing an objective basis for individualized clinical treatment plans. Methods To accurately identify the severity of UVCP, this study developed the CNN model TripleConvNet. Voice samples were collected from 131 healthy individuals and 292 confirmed UVCP patients from the Eye and ENT Hospital of Fudan University. Based on vocal fold compensation function, the patients were divided into three groups: decompensated (84 cases), partially compensated (98 cases), and fully compensated (110 cases). Using Mel-spectrograms and their first- and second-order differential features as inputs, the TripleConvNet model classified patients by severity and was systematically evaluated for its performance in UVCP severity grading tasks. Results TripleConvNet achieved a classification accuracy of 74.3% in distinguishing between healthy voices and the UVCP decompensated, partially compensated, and fully compensated groups. Conclusion This study demonstrates the potential of deep learning-based non-invasive voice analysis for precise grading of UVCP severity. The proposed method offers a promising clinical tool to assist physicians in disease assessment and personalized treatment planning.

Published in BioMedical Engineering OnLine

ISSN: 1475-925X (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Medical technology
Website: https://biomedical-engineering-online.biomedcentral.com

About the journal

Abstract

Keywords