The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Des expériences ont été menées pour examiner une approche du côté de la modélisation du langage visant à améliorer les performances de reconnaissance vocale bruyante. En adoptant des chaînes de mots appropriées comme nouvelles unités de traitement, les performances de reconnaissance vocale ont été améliorées par des effets acoustiques ainsi que par la réduction de la perplexité des ensembles de tests. Trois types de modèles de langage de chaînes de mots ont été évalués, dont les entrées lexicales supplémentaires ont été sélectionnées sur la base de combinaisons d'informations sur une partie du discours, de la longueur des mots, de la fréquence d'occurrence et du rapport de vraisemblance des hypothèses concernant la fréquence des bigrammes. Les trois modèles de chaînes de mots ont réduit les erreurs de reconnaissance vocale des informations diffusées et ont également réduit la perplexité de l'ensemble de test. Le modèle de chaîne de mots basé sur le rapport de vraisemblance logarithmique a présenté la meilleure amélioration pour la reconnaissance vocale bruyante, grâce à laquelle les erreurs de suppression ont été réduites de 26 %, les erreurs de substitution de 9.3 % et les erreurs d'insertion de 13 %, dans les expériences utilisant le modèle dépendant du locuteur. triphone adapté au bruit. L'efficacité des modèles de chaînes de mots sur la réduction des erreurs était plus importante pour la parole bruyante que pour la parole propre en studio.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Kazuyuki TAKAGI, Rei OGURO, Kazuhiko OZEKI, "Effectiveness of Word String Language Models on Noisy Broadcast News Speech Recognition" in IEICE TRANSACTIONS on Information,
vol. E85-D, no. 7, pp. 1130-1137, July 2002, doi: .
Abstract: Experiments were conducted to examine an approach from language modeling side to improving noisy speech recognition performance. By adopting appropriate word strings as new units of processing, speech recognition performance was improved by acoustic effects as well as by test-set perplexity reduction. Three kinds of word string language models were evaluated, whose additional lexical entries were selected based on combinations of part of speech information, word length, occurrence frequency, and log likelihood ratio of the hypotheses about the bigram frequency. All of the three word string models reduced errors in broadcast news speech recognition, and also lowered test-set perplexity. The word string model based on log likelihood ratio exhibited the best improvement for noisy speech recognition, by which deletion errors were reduced by 26%, substitution errors by 9.3%, and insertion errors by 13%, in the experiments using the speaker-dependent, noise-adapted triphone. Effectiveness of word string models on error reduction was more prominent for noisy speech than for studio-clean speech.
URL: https://global.ieice.org/en_transactions/information/10.1587/e85-d_7_1130/_p
Copier
@ARTICLE{e85-d_7_1130,
author={Kazuyuki TAKAGI, Rei OGURO, Kazuhiko OZEKI, },
journal={IEICE TRANSACTIONS on Information},
title={Effectiveness of Word String Language Models on Noisy Broadcast News Speech Recognition},
year={2002},
volume={E85-D},
number={7},
pages={1130-1137},
abstract={Experiments were conducted to examine an approach from language modeling side to improving noisy speech recognition performance. By adopting appropriate word strings as new units of processing, speech recognition performance was improved by acoustic effects as well as by test-set perplexity reduction. Three kinds of word string language models were evaluated, whose additional lexical entries were selected based on combinations of part of speech information, word length, occurrence frequency, and log likelihood ratio of the hypotheses about the bigram frequency. All of the three word string models reduced errors in broadcast news speech recognition, and also lowered test-set perplexity. The word string model based on log likelihood ratio exhibited the best improvement for noisy speech recognition, by which deletion errors were reduced by 26%, substitution errors by 9.3%, and insertion errors by 13%, in the experiments using the speaker-dependent, noise-adapted triphone. Effectiveness of word string models on error reduction was more prominent for noisy speech than for studio-clean speech.},
keywords={},
doi={},
ISSN={},
month={July},}
Copier
TY - JOUR
TI - Effectiveness of Word String Language Models on Noisy Broadcast News Speech Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 1130
EP - 1137
AU - Kazuyuki TAKAGI
AU - Rei OGURO
AU - Kazuhiko OZEKI
PY - 2002
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E85-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2002
AB - Experiments were conducted to examine an approach from language modeling side to improving noisy speech recognition performance. By adopting appropriate word strings as new units of processing, speech recognition performance was improved by acoustic effects as well as by test-set perplexity reduction. Three kinds of word string language models were evaluated, whose additional lexical entries were selected based on combinations of part of speech information, word length, occurrence frequency, and log likelihood ratio of the hypotheses about the bigram frequency. All of the three word string models reduced errors in broadcast news speech recognition, and also lowered test-set perplexity. The word string model based on log likelihood ratio exhibited the best improvement for noisy speech recognition, by which deletion errors were reduced by 26%, substitution errors by 9.3%, and insertion errors by 13%, in the experiments using the speaker-dependent, noise-adapted triphone. Effectiveness of word string models on error reduction was more prominent for noisy speech than for studio-clean speech.
ER -