The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Nous proposons une méthode de déréverbération aveugle basée sur la soustraction spectrale utilisant un algorithme de moindres carrés moyens multicanaux (MCLMS) pour la reconnaissance vocale à distance. Dans un environnement de conversation distante, la réponse impulsionnelle du canal est plus longue que la fenêtre d'analyse spectrale à court terme. En traitant la réverbération tardive comme un bruit additif, une technique de réduction du bruit basée sur la soustraction spectrale a été proposée pour estimer le spectre de puissance de la parole claire en utilisant les spectres de puissance de la parole déformée et les réponses impulsionnelles inconnues. Pour estimer les spectres de puissance des réponses impulsionnelles, un algorithme MCLMS sans contrainte à pas variable (VSS-UMCLMS) permettant d'identifier les réponses impulsionnelles dans un domaine temporel est étendu à un domaine fréquentiel. Pour réduire l'effet de l'erreur d'estimation de la réponse impulsionnelle du canal, nous normalisons la réverbération précoce par normalisation moyenne cepstrale (CMN) au lieu de soustraction spectrale en utilisant la réponse impulsionnelle estimée. De plus, la méthode que nous proposons est combinée avec la formation de faisceaux conventionnelle à retard et à somme. Nous avons mené des expériences de reconnaissance sur un signal vocal déformé simulé par des réponses impulsionnelles convolutives multicanaux avec une parole claire. La méthode proposée a atteint un taux de réduction d’erreur relative de 22.4 % par rapport au CMN conventionnel. En combinant la méthode proposée avec la formation de faisceaux, un taux de réduction d'erreur relative de 24.5 % par rapport au CMN conventionnel avec formation de faisceaux a été obtenu en utilisant uniquement un mot isolé (d'une durée d'environ 0.6 s) pour estimer le spectre de la réponse impulsionnelle.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Longbiao WANG, Norihide KITAOKA, Seiichi NAKAGAWA, "Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm" in IEICE TRANSACTIONS on Information,
vol. E94-D, no. 3, pp. 659-667, March 2011, doi: 10.1587/transinf.E94.D.659.
Abstract: We propose a blind dereverberation method based on spectral subtraction using a multi-channel least mean squares (MCLMS) algorithm for distant-talking speech recognition. In a distant-talking environment, the channel impulse response is longer than the short-term spectral analysis window. By treating the late reverberation as additive noise, a noise reduction technique based on spectral subtraction was proposed to estimate the power spectrum of the clean speech using power spectra of the distorted speech and the unknown impulse responses. To estimate the power spectra of the impulse responses, a variable step-size unconstrained MCLMS (VSS-UMCLMS) algorithm for identifying the impulse responses in a time domain is extended to a frequency domain. To reduce the effect of the estimation error of the channel impulse response, we normalize the early reverberation by cepstral mean normalization (CMN) instead of spectral subtraction using the estimated impulse response. Furthermore, our proposed method is combined with conventional delay-and-sum beamforming. We conducted recognition experiments on a distorted speech signal simulated by convolving multi-channel impulse responses with clean speech. The proposed method achieved a relative error reduction rate of 22.4% in relation to conventional CMN. By combining the proposed method with beamforming, a relative error reduction rate of 24.5% in relation to the conventional CMN with beamforming was achieved using only an isolated word (with duration of about 0.6 s) to estimate the spectrum of the impulse response.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E94.D.659/_p
Copier
@ARTICLE{e94-d_3_659,
author={Longbiao WANG, Norihide KITAOKA, Seiichi NAKAGAWA, },
journal={IEICE TRANSACTIONS on Information},
title={Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm},
year={2011},
volume={E94-D},
number={3},
pages={659-667},
abstract={We propose a blind dereverberation method based on spectral subtraction using a multi-channel least mean squares (MCLMS) algorithm for distant-talking speech recognition. In a distant-talking environment, the channel impulse response is longer than the short-term spectral analysis window. By treating the late reverberation as additive noise, a noise reduction technique based on spectral subtraction was proposed to estimate the power spectrum of the clean speech using power spectra of the distorted speech and the unknown impulse responses. To estimate the power spectra of the impulse responses, a variable step-size unconstrained MCLMS (VSS-UMCLMS) algorithm for identifying the impulse responses in a time domain is extended to a frequency domain. To reduce the effect of the estimation error of the channel impulse response, we normalize the early reverberation by cepstral mean normalization (CMN) instead of spectral subtraction using the estimated impulse response. Furthermore, our proposed method is combined with conventional delay-and-sum beamforming. We conducted recognition experiments on a distorted speech signal simulated by convolving multi-channel impulse responses with clean speech. The proposed method achieved a relative error reduction rate of 22.4% in relation to conventional CMN. By combining the proposed method with beamforming, a relative error reduction rate of 24.5% in relation to the conventional CMN with beamforming was achieved using only an isolated word (with duration of about 0.6 s) to estimate the spectrum of the impulse response.},
keywords={},
doi={10.1587/transinf.E94.D.659},
ISSN={1745-1361},
month={March},}
Copier
TY - JOUR
TI - Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm
T2 - IEICE TRANSACTIONS on Information
SP - 659
EP - 667
AU - Longbiao WANG
AU - Norihide KITAOKA
AU - Seiichi NAKAGAWA
PY - 2011
DO - 10.1587/transinf.E94.D.659
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E94-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2011
AB - We propose a blind dereverberation method based on spectral subtraction using a multi-channel least mean squares (MCLMS) algorithm for distant-talking speech recognition. In a distant-talking environment, the channel impulse response is longer than the short-term spectral analysis window. By treating the late reverberation as additive noise, a noise reduction technique based on spectral subtraction was proposed to estimate the power spectrum of the clean speech using power spectra of the distorted speech and the unknown impulse responses. To estimate the power spectra of the impulse responses, a variable step-size unconstrained MCLMS (VSS-UMCLMS) algorithm for identifying the impulse responses in a time domain is extended to a frequency domain. To reduce the effect of the estimation error of the channel impulse response, we normalize the early reverberation by cepstral mean normalization (CMN) instead of spectral subtraction using the estimated impulse response. Furthermore, our proposed method is combined with conventional delay-and-sum beamforming. We conducted recognition experiments on a distorted speech signal simulated by convolving multi-channel impulse responses with clean speech. The proposed method achieved a relative error reduction rate of 22.4% in relation to conventional CMN. By combining the proposed method with beamforming, a relative error reduction rate of 24.5% in relation to the conventional CMN with beamforming was achieved using only an isolated word (with duration of about 0.6 s) to estimate the spectrum of the impulse response.
ER -