Modeling Sub-Band Information Through Discrete Wavelet Transform to Improve Intelligibility Assessment of Dysarthric Speech.
DOI:
https://doi.org/10.9781/ijimai.2022.10.003Keywords:
Approximation Coefficient, Cepstral Coefficients, Detail Coefficient, Dysarthria, Signal, Discrete Wavelet TransformsAbstract
The speech signal within a sub-band varies at a fine level depending on the type, and level of dysarthria. The Mel-frequency filterbank used in the computation process of cepstral coefficients smoothed out this fine level information in the higher frequency regions due to the larger bandwidth of filters. To capture the sub-band information, in this paper, four-level discrete wavelet transform (DWT) decomposition is firstly performed to decompose the input speech signal into approximation and detail coefficients, respectively, at each level. For a particular input speech signal, five speech signals representing different sub-bands are then reconstructed using inverse DWT (IDWT). The log filterbank energies are computed by analyzing the short-term discrete Fourier transform magnitude spectra of each reconstructed speech using a 30-channel Mel-filterbank. For each analysis frame, the log filterbank energies obtained across all reconstructed speech signals are pooled together, and discrete cosine transform is performed to represent the cepstral feature, here termed as discrete wavelet transform reconstructed (DWTR)- Mel frequency cepstral coefficient (MFCC). The i-vector based dysarthric level assessment system developed on the universal access speech corpus shows that the proposed DTWRMFCC feature outperforms the conventional MFCC and several other cepstral features reported for a similar task. The usages of DWTR- MFCC improve the detection accuracy rate (DAR) of the dysarthric level assessment system in the text and the speaker-independent test case to 60.094 % from 56.646 % MFCC baseline. Further analysis of the confusion matrices shows that confusion among different dysarthric classes is quite different for MFCC and DWTR-MFCC features. Motivated by this observation, a two-stage classification approach employing discriminating power of both kinds of features is proposed to improve the overall performance of the developed dysarthric level assessment system. The two-stage classification scheme further improves the DAR to 65.813 % in the text and speaker- independent test case.
Downloads
References
J. R. Duffy, Motor speech disorders e-book: Substrates, differential diagnosis, and management. Elsevier Health Sciences, 2019.
R. Sandyk, “Resolution of dysarthria in multiple sclerosis by treatment with weak electromagnetic fields,” International Journal of Neuroscience, vol. 83, no. 1-2, pp. 81–92, 1995.
J. Müller, G. K. Wenning, M. Verny, A. McKee, K. R. Chaudhuri, K. Jellinger, W. Poewe, I. Litvan, “Progression of dysarthria and dysphagia in postmortem-confirmed parkinsonian disorders,” Archives of neurology, vol. 58, no. 2, pp. 259–264, 2001.
S. Skodda, H. Rinsche, U. Schlegel, “Progression of dysprosody in Parkinson’s disease over time—a longitudinal study,” Movement disorders: official journal of the Movement Disorder Society, vol. 24, no. 5, pp. 716–722, 2009.
J. B. Polikoff, H. T. Bunnell, “The nemours database of dysarthric speech: A perceptual analysis,” in Proc. ICPS, 1999, pp. 783–786.
R. D. Kent, “Hearing and believing: Some limits to the auditory-perceptual assessment of speech and voice disorders,” American Journal of Speech-Language Pathology, vol. 5, no. 3, pp. 7–23, 1996.
G. Constantinescu, D. Theodoros, T. Russell, E. Ward, S. Wilson, R. Wootton, “Assessing disordered speech and voice in Parkinson’s disease: a telerehabilitation application,” International journal of language & communication disorders, vol. 45, no. 6, pp. 630–644, 2010.
K. K. Baker, L. O. Ramig, E. S. Luschei, M. E. Smith, “Thyroarytenoid muscle activity associated with hypophonia in parkinson disease and aging,” Neurology, vol. 51, no. 6, pp. 1592–1598, 1998.
S. Skodda, W. Visser, U. Schlegel, “Vowel articulation in Parkinson’s disease,” Journal of Voice, vol. 25, no. 4, pp. 467–472, 2011.
A. Maier, T. Haderlein, F. Stelzle, E. Nöth, E. Nkenke, F. Rosanowski, A. Schützenberger, M. Schuster, “Automatic speech recognition systems for the evaluation of voice and speech disorders in head and neck cancer,”EURASIP Journal on Audio, Speech, and Music Processing, vol. 2010, pp. 1–7, 2009.
A. Maier, T. Haderlein, U. Eysholdt, F. Rosanowski, A. Batliner, M. Schuster, E. Nöth, “Peaks–a system for the automatic evaluation of voice and speech disorders,” Speech Communication, vol. 51, no. 5, pp. 425–437, 2009.
K. Riedhammer, G. Stemmer, T. Haderlein, M. Schuster, F. Rosanowski, E. Noth, A. Maier, “Towards robust automatic evaluation of pathologic telephone speech,” in Proc. ASRU (Workshop), 2007, pp. 717–722.
M. J. Kim, Y. Kim, H. Kim, “Automatic intelligibility assessment of dysarthric speech using phonologically- structured sparse linear model,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 23, no. 4, pp. 694–704, 2015.
C. Deng, G. Lai, H. Deng, “Improving word vector model with part-of-speech and dependency grammar information,” CAAI Transactions on Intelligence Technology, vol. 5, no. 4, pp. 276–282, 2020.
F. Rudzicz, “Comparing speaker-dependent and speaker-adaptive acoustic models for recognizing dysarthric speech,” in Proc. International ACM SIGACCESS on Computers and Accessibility, 2007, pp. 255–256.
F. Rudzicz, “Phonological features in discriminative classification of dysarthric speech,” in Proc. ICASSP, 2009, pp. 4605–4608.
K. L. Kadi, S. A. Selouani, B. Boudraa, M. Boudraa, “Automated diagnosis and assessment of dysarthric speech using relevant prosodic features,” in Transactions on Engineering Technologies, 2014, pp. 529–542.
C. Bhat, B. Vachhani, S. K. Kopparapu, “Automatic assessment of dysarthria severity level using audio descriptors,” in Proc. ICASSP, 2017, pp. 5070–5074.
M. Perez, W. Jin, D. Le, N. Carlozzi, P. Dayalu, A. Roberts, E. M. Provost, “Classification of huntington disease using acoustic and lexical features,” in Proc. INTERSPEECH, 2018, pp. 1898–1902.
N. Saleem, M. I. Khattak, “Deep neural networks for speech enhancement in complex-noisy environments,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 6, no. 1, pp. 84–90, 2020.
N. Saleem, M. I. Khattak, E. Verdú, “On improvement of speech intelligibility and quality: A survey of unsupervised single channel speech enhancement algorithms,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 6, no. 2, pp. 78–89, 2020.
B. Yang, R. Ding, Y. Ban, X. Li, H. Liu, “Enhancing direct-path relative transfer function using deep neural network for robust sound source localization,” CAAI Transactions on Intelligence Technology, pp. 1–9, 2021.
M. J. Kim, B. Cao, K. An, J. Wang, “Dysarthric speech recognition using convolutional LSTM neural network,” in Proc. INTERSPEECH, 2018, pp. 2948–2952.
H. Liu, P. Yuan, B. Yang, G. Yang, Y. Chen, “Head-related transfer function–reserved time- frequency masking for robust binaural sound source localization,” CAAI Transactions on Intelligence Technology, pp. 26–33, 2021.
A. A. Joshy, R. Rajan, “Automated dysarthria severity classification using deep learning frameworks,” in Proc. European Signal Processing Conference, 2021, pp. 116–120.
C. Bhat, H. Strik, “Automatic assessment of sentence- level dysarthria intelligibility using BLSTM,” IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 2, pp. 322–330, 2020.
K. Gurugubelli, A. K. Vuppala, “Perceptually enhanced single frequency filtering for dysarthric speech detection and intelligibility assessment,” in Proc. ICASSP, 2019, pp. 6410–6414.
K. Gurugubelli, A. K. Vuppala, “Analytic phase features for dysarthric speech detection and intelligibility assessment,” Speech Communication, vol. 121, pp. 1–15, 2020.
S.-A. Selouani, H. Dahmani, R. Amami, H. Hamam, “Using speech rhythm knowledge to improve dysarthric speech recognition,” International Journal of Speech Technology, vol. 15, no. 1, pp. 57–64, 2012.
J. Kim, N. Kumar, A. Tsiartas, M. Li, S. S. Narayanan, “Automatic intelligibility classification of sentence- level pathological speech,” Computer Speech & Language, vol. 29, no. 1, pp. 132–144, 2015.
K. Kadi, S. A. Selouani, B. Boudraa, M. Boudraa, “Fully automated speaker identification and intelligibility assessment in dysarthria disease using auditory knowledge,” Biocybernetics and Biomedical Engineering, vol. 36, no. 1, pp. 233–247, 2016.
A. Benba, A. Jilbab, A. Hammouch, “Detecting patients with Parkinson’s disease using Mel frequency cepstral coefficients and support vector machines,” International Journal on Electrical Engineering and Informatics, vol. 7, no. 2, pp. 297–307, 2015.
H. Chandrashekar, V. Karjigi, N. Sreedevi, “Investigation of different time-frequency representations for intelligibility assessment of dysarthric speech,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 28, no. 12, pp. 2880–2889, 2020.
D. Martínez, E. Lleida, P. Green, H. Christensen, A. Ortega, A. Miguel, “Intelligibility assessment and speech recognizer word accuracy rate prediction for dysarthric speakers in a factor analysis subspace,” ACM Transactions on Accessible Computing, vol. 6, no. 3, pp. 1–21, 2015.
A. Benba, A. Jilbab, A. Hammouch, “Discriminating between patients with Parkinson’s and neurological diseases using cepstral analysis,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 24, no. 10, pp. 1100–1108, 2016.
S. Oue, R. Marxer, F. Rudzicz, “Automatic dysfluency detection in dysarthric speech using deep belief networks,” in Proc. SLPAT, 2015, pp. 60–64.
J. C. Brown, “Calculation of a constant Q spectral transform,” The Journal of the Acoustical Society of America, vol. 89, no. 1, pp. 425–434, 1991.
M. Little, P. McSharry, E. Hunter, J. Spielman, L. Ramig, “Suitability of dysphonia measurements for telemonitoring of parkinson’s disease,” Nature Precedings, pp. 1–27, 2008.
A. Tsanas, M. A. Little, P. E. McSharry, J. Spielman, L. O. Ramig, “Novel speech signal processing algorithms for high-accuracy classification of parkinson’s disease,” IEEE Transactions on Biomedical Engineering, vol. 59, no. 5, pp. 1264–1271, 2012.
I. Kodrasi, H. Bourlard, “Super-gaussianity of speech spectral coefficients as a potential biomarker for dysarthric speech detection,” in Proc. ICASSP, 2019, pp. 6400–6404.
N. Narendra, P. Alku, “Dysarthric speech classification using glottal features computed from non-words, words and sentences,” in Proc. INTERSPEECH, 2018, pp. 3403–3407.
N. Narendra, P. Alku, “Dysarthric speech classification from coded telephone speech using glottal features,” Speech Communication, vol. 110, pp. 47–55, 2019.
N. Narendra, P. Alku, “Automatic assessment of intelligibility in speakers with dysarthria from coded telephone speech using glottal features,” Computer Speech & Language, vol. 65, pp. 1–14, 2021.
S. G. Mallat, “A theory for multiresolution signal decomposition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, pp. 674–693, 1989.
I. Daubechies, “The wavelet transform, time-frequency localization and signal analysis,” IEEE Transactions on information theory, vol. 36, no. 5, pp. 961–1005, 1990.
S. Davis, P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no. 4, pp. 357–366, 1980.
P. Singh, G. Pradhan, S. Shahnawazuddin, “Denoising of ecg signal by non-local estimation of approximation coefficients in dwt,” Biocybernetics and Biomedical Engineering, vol. 37, no. 3, pp. 599–610, 2017.
H. Ayad, M. Khalil, “Qam-dwt-svd based watermarking scheme for medical images,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, no. 3, pp. 81–89, 2018.
R. Abbasi, M. Esmaeilpour, “Selecting statistical characteristics of brain signals to detect epileptic seizures using discrete wavelet transform and perceptron neural network,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 4, no. 5, pp. 33–38, 2017.
A. K. H. Al-Ali, D. Dean, B. Senadji, V. Chandran, G. R. Naik, “Enhanced forensic speaker verification using a combination of dwt and mfcc feature warping in the presence of noise and reverberation conditions,” IEEE Access, vol. 5, pp. 15400–15413, 2017.
H. Kim, M. Hasegawa-Johnson, A. Perlman, J. Gunderson, T. S. Huang, K. Watkin, S. Frame, “Dysarthric speech database for universal access research,” in Proc. INTERSPEECH, 2008, pp. 1741–1744.
X. Zhou, D. Garcia-Romero, R. Duraiswami, C. Espy- Wilson, S. Shamma, “Linear versus mel frequency cepstral coefficients for speaker recognition,” in Proc. ASRU (Workshop), 2011, pp. 559–564.
L. R. Rabiner, R. W. Schafer, “Introduction to digital speech processing,” Foundations and Trends in Signal Processing, vol. 1, no. 1–2, pp. 1–194, 2007.
P. Tsiakoulis, A. Potamianos, D. Dimitriadis, “Spectral moment features augmented by low order cepstral coefficients for robust asr,” IEEE Signal Processing Letters, vol. 17, no. 6, pp. 551–554, 2010.
K. Maity, G. Pradhan, J. P. Singh, “A pitch and noise robust keyword spotting system using smac features with prosody modification,” Circuits, Systems, and Signal Processing, vol. 40, no. 4, pp. 1892–1904, 2021.
J. G. Wilpon, L. R. Rabiner, T. Martin, “An improved word-detection algorithm for telephone-quality speech incorporating both syntactic and semantic constraints,” AT&T Bell Laboratories Technical Journal, vol. 63, no. 3, pp. 479–498, 1984.
O. Viikki, K. Laurila, “Cepstral domain segmental feature vector normalization for noise robust speech recognition,” Speech Communication, vol. 25, no. 1-3, pp. 133–147, 1998.
N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, “Front-end factor analysis for speaker verification,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 788–798, 2010.
D. Garcia-Romero, C. Y. Espy-Wilson, “Analysis of i-vector length normalization in speaker recognition systems,” in Proc. INTERSPEECH, 2011, pp. 249–252.
G. Pradhan, S. R. M. Prasanna, “Speaker verification by vowel and nonvowel like segmentation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 4, pp. 854–867, 2013.
S. Balakrishnama, A. Ganapathiraju, “Linear discriminant analysis-a brief tutorial,” Institute for Signal and information Processing, vol. 18, pp. 1–8, 1998.
A. O. Hatch, S. S. Kajarekar, A. Stolcke, “Within- class covariance normalization for svm-based speaker recognition.,” in Proc. INTERSPEECH, 2006, pp. 1471–1474.
N. Dehak, R. Dehak, J. R. Glass, D. A. Reynolds, P. Kenny, et al., “Cosine similarity scoring without score normalization techniques.,” in Proc. Odyssey, 2010, pp. 1–5.
P. Kenny, T. Stafylakis, P. Ouellet, M. J. Alam, P. Dumouchel, “Plda for speaker verification with utterances of arbitrary duration,” in Proc. ICASSP, 2013, pp. 7649–7653.
Y. Jung, Y. Kim, H. Lim, H. Kim, “Linear-scale filterbank for deep neural network-based voice activity detection,” in Proc. O-COCOSDA, 2017, pp. 1–5.
S. Debnath, P. Roy, “Audio-visual automatic speech recognition using pzm, mfcc and statistical analysis,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 7, no. 2, pp. 121–133, 2021.
B. Pattanayak, G. Pradhan, “Pitch-robust acoustic feature using single frequency filtering for children’s kws,” Pattern Recognition Letters, vol. 150, pp. 183–188, 2021.
J. G. Wilpon, C. N. Jacobsen, “A study of speech recognition for children and the elderly,” in Proc. ICASSP, vol. 1, 1996, pp. 349–352.
Downloads
Published
-
Abstract178
-
PDF26






