工程科学与技术 (Jan 2025)
Depression detection based on dual path DCGAN data generation and classification-regression network
Abstract
ObjectiveAccurate evaluation of depression scores of depressed patients can not only provide effective support for clinical auxiliary diagnosis, but also develop personalized diagnosis and treatment plans for different patients, thus improving the accuracy of clinical diagnosis and treatment, which is of great significance for improving the health level of depressed patients. The existing research on voice-based depression detection has many problems, such as complicated feature extraction, single data expansion and uncontrollable prediction bias in regression prediction. In this paper, a dual path DCGAN is proposed to generate data, and a classification - regression network model is proposed to predict the depression score, so as to realize the auxiliary diagnosis of the degree of depression.MethodsFirstly, based on the audio characteristics of depressed patients, six kinds of emotional features are selected from the existing speech features, and the corresponding two-dimensional feature maps are constructed for each audio signal sample. For MFCC features, the Teager energy operator is fused into MFCC features to form MFCC-TEO features, which can further highlight the difference of energy distribution. In addition, the dual path deep convolutional generation adversarial network proposed in this paper is used to enhance the two-dimensional feature maps of each depression level to expand the dataset, increase the diversity of feature maps, and improve the robustness and generalization of the model. At the same time, an evaluation index is proposed based on spatial characteristics and frequency domain characteristics to screen the generated feature maps and retain the generated high-quality feature maps. Finally, the classification regression network is proposed in the prediction network to reduce the prediction bias by reducing the prediction confidence interval. For residual networks in classification networks, multi-scale convolution is introduced to enhance the information interaction between features, so that residual networks can fully perceive the multi-level information contained in feature maps.Results and Discussions Feature validity test was carried out for the six emotional features selected, that is, MFCC, MFCC-TEO, LPCC and Jitter features were added in turn on the basis of short-term energy, zero cross rate and sound intensity, and accuracy (Acc), root mean square error (RMSE) and mean absolute error (MAE) under different input characteristics were calculated. The experimental data showed that Acc, RMSE and MAE were 89.76%, 6.17 and 2.08 respectively when MFCC was added. When MFCC-TEO was added, Acc, RMSE and MAE were 92.07%, 5.49 and 1.58, respectively. When MFCC-TEO and LPCC were added, Acc, RMSE and MAE were 93.41%, 5.09 and 1.39, respectively. When MFCC-TEO, LPCC and Jitter are added, Acc, RMSE and MAE are 94.73%, 4.55 and 1.11, respectively. The results show that when MFCC-TEO feature is used as input feature, Acc increases by 2.31%, RMSE and MAE decrease by 0.68 and 0.50, respectively, compared with when MFCC feature is used as input feature, which indicates that the difference of energy distribution in MFCC is increased by combining MFCC feature with TEO. The combined MFCC-TEO coefficient has a better ability to characterize depression than MFCC coefficient. Later, the addition of LPCC and Jitter features also improved the accuracy of model prediction to a certain extent. In the data enhancement part, when the original data set was used to predict depression scores, Acc, RMSE and MAE were 80.51%, 8.47 and 3.94, respectively. The Acc, RMSE and MAE were 94.73%, 4.55 and 1.11 respectively for depression score prediction after data enhancement using dual deep convolutional generation adversarial network. Compared with the original dataset, the prediction accuracy was significantly improved, in which Acc increased by 14.22%, RMSE and MAE decreased by 3.92 and 2.83, respectively, proving that DP-DCGAN data enhancement can effectively achieve the purpose of expanding the dataset. In the prediction network, the classification accuracy of the original ResNet network is 93.28%, and the classification accuracy of the MSC-ResNet network with the multi-scale convolutional idea is 94.73%, with an increase of 1.45%. It proves that the multi-scale convolutional idea can extract more abundant global information and broader context information. After that, the detailed information is extracted by the residual network, so that the network can fully perceive the multi-scale information contained in the input feature map, which plays a purpose of improving the performance of the model.ConclusionsThis paper proposes a diagnosis model of depression based on deep generation network and classification-regression framework. The feature of MFCC-TEO is obtained by introducing TEO into MFCC, and six features including TEO are extracted to construct a two-dimensional feature map with time-frequency properties, linear properties and nonlinear properties. By constructing a DP-DCGAN network, the feature maps of each depression score in the original dataset were enhanced to increase the diversity of the feature maps, and evaluation indicators were proposed to screen out high-quality feature maps from both spatial characteristics and frequency domain characteristics. High-quality and diversified feature maps significantly improved the performance of the model. Finally, the proposed MRVN classified-regression network was used to predict depression scores. Combined with the unique features of the feature maps in this paper, a multi-scale convolution module was added to the ResNet classification network to improve the limitation of feature extraction in the single scale receptor field in the ResNet network. In addition, by combining classification and regression, The input data can be regression forecast on a more uniform scale, and the problem of large prediction deviation in regression prediction is improved.