Modeling Sub-Band Information Through Discrete Wavelet Transform to Improve Intelligibility Assessment of Dysarthric Speech

Laxmi Priya Sahu; Gayadhar Pradhan; Jyoti Prakash Singh

Author	Laxmi Priya Sahu Gayadhar Pradhan Jyoti Prakash Singh
Keywords	Approximation Coefficient Cepstral Coefficients Detail Coefficient Dysarthria Signal Discrete Wavelet Transforms
Abstract	The speech signal within a sub-band varies at a fine level depending on the type, and level of dysarthria. The Mel-frequency filterbank used in the computation process of cepstral coefficients smoothed out this fine level information in the higher frequency regions due to the larger bandwidth of filters. To capture the sub-band information, in this paper, four-level discrete wavelet transform (DWT) decomposition is firstly performed to decompose the input speech signal into approximation and detail coefficients, respectively, at each level. For a particular input speech signal, five speech signals representing different sub-bands are then reconstructed using inverse DWT (IDWT). The log filterbank energies are computed by analyzing the short-term discrete Fourier transform magnitude spectra of each reconstructed speech using a 30-channel Mel-filterbank. For each analysis frame, the log filterbank energies obtained across all reconstructed speech signals are pooled together, and discrete cosine transform is performed to represent the cepstral feature, here termed as discrete wavelet transform reconstructed (DWTR)- Mel frequency cepstral coefficient (MFCC). The i-vector based dysarthric level assessment system developed on the universal access speech corpus shows that the proposed DTWRMFCC feature outperforms the conventional MFCC and several other cepstral features reported for a similar task. The usages of DWTR- MFCC improve the detection accuracy rate (DAR) of the dysarthric level assessment system in the text and the speaker-independent test case to 60.094 % from 56.646 % MFCC baseline. Further analysis of the confusion matrices shows that confusion among different dysarthric classes is quite different for MFCC and DWTR-MFCC features. Motivated by this observation, a two-stage classification approach employing discriminating power of both kinds of features is proposed to improve the overall performance of the developed dysarthric level assessment system. The two-stage classification scheme further improves the DAR to 65.813 % in the text and speaker- independent test case.
Year of Publication	2022
Journal	International Journal of Interactive Multimedia and Artificial Intelligence
Volume	7
Issue	Regular Issue
Number	7
Number of Pages	56-64
Date Published	12/2022
ISSN Number	1989-1660
URL	https://www.ijimai.org/journal/sites/default/files/2022-11/ijimai7_7_6.pdf
DOI	10.9781/ijimai.2022.10.003
	DOI Google Scholar BibTeX EndNote X3 XML EndNote 7 XML Endnote tagged Marc RIS
Attachment	ijimai7_7_6.pdf342.08 KB