The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Cet article décrit un système de détection des erreurs de lecture basé sur le cadre conventionnel de reconnaissance vocale continue à grand vocabulaire (LVCSR) [1]. Afin d'incorporer les connaissances de référence (ce que le lecteur doit lire) et certains modèles d'erreurs dans le processus de décodage, deux méthodes sont proposées : l'incorporation dynamique de prononciation multiple (DMPI) et l'interpolation dynamique du modèle linguistique (DILM). DMPI ajoute dynamiquement quelques variations de prononciation dans l'espace de recherche pour prédire les substitutions et les insertions de lecture. Pour résoudre le conflit entre la couverture des prédications d’erreurs et la perplexité de l’espace de recherche, seules les variantes de prononciation liées à la référence sont ajoutées. DILM interpole dynamiquement le modèle de langage général sur la base de l'analyse de la référence et maintient ainsi les chemins actifs de décodage relativement proches de la référence. Cela rend la reconnaissance plus précise, ce qui améliore encore les performances de détection. Au stade final de la détection, un programme dynamique (DP) amélioré est utilisé pour aligner le réseau de confusion (CN) issu de la reconnaissance vocale et la référence afin de générer le résultat de détection. Les résultats expérimentaux montrent que les deux méthodes proposées peuvent diminuer le taux d'erreur égal (EER) de 14 % relativement, de 46.4 % à 39.8 %.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Changliang LIU, Fuping PAN, Fengpei GE, Bin DONG, Hongbin SUO, Yonghong YAN, "An LVCSR Based Reading Miscue Detection System Using Knowledge of Reference and Error Patterns" in IEICE TRANSACTIONS on Information,
vol. E92-D, no. 9, pp. 1716-1724, September 2009, doi: 10.1587/transinf.E92.D.1716.
Abstract: This paper describes a reading miscue detection system based on the conventional Large Vocabulary Continuous Speech Recognition (LVCSR) framework [1]. In order to incorporate the knowledge of reference (what the reader ought to read) and some error patterns into the decoding process, two methods are proposed: Dynamic Multiple Pronunciation Incorporation (DMPI) and Dynamic Interpolation of Language Model (DILM). DMPI dynamically adds some pronunciation variations into the search space to predict reading substitutions and insertions. To resolve the conflict between the coverage of error predications and the perplexity of the search space, only the pronunciation variants related to the reference are added. DILM dynamically interpolates the general language model based on the analysis of the reference and so keeps the active paths of decoding relatively near the reference. It makes the recognition more accurate, which further improves the detection performance. At the final stage of detection, an improved dynamic program (DP) is used to align the confusion network (CN) from speech recognition and the reference to generate the detecting result. The experimental results show that the proposed two methods can decrease the Equal Error Rate (EER) by 14% relatively, from 46.4% to 39.8%.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E92.D.1716/_p
Copier
@ARTICLE{e92-d_9_1716,
author={Changliang LIU, Fuping PAN, Fengpei GE, Bin DONG, Hongbin SUO, Yonghong YAN, },
journal={IEICE TRANSACTIONS on Information},
title={An LVCSR Based Reading Miscue Detection System Using Knowledge of Reference and Error Patterns},
year={2009},
volume={E92-D},
number={9},
pages={1716-1724},
abstract={This paper describes a reading miscue detection system based on the conventional Large Vocabulary Continuous Speech Recognition (LVCSR) framework [1]. In order to incorporate the knowledge of reference (what the reader ought to read) and some error patterns into the decoding process, two methods are proposed: Dynamic Multiple Pronunciation Incorporation (DMPI) and Dynamic Interpolation of Language Model (DILM). DMPI dynamically adds some pronunciation variations into the search space to predict reading substitutions and insertions. To resolve the conflict between the coverage of error predications and the perplexity of the search space, only the pronunciation variants related to the reference are added. DILM dynamically interpolates the general language model based on the analysis of the reference and so keeps the active paths of decoding relatively near the reference. It makes the recognition more accurate, which further improves the detection performance. At the final stage of detection, an improved dynamic program (DP) is used to align the confusion network (CN) from speech recognition and the reference to generate the detecting result. The experimental results show that the proposed two methods can decrease the Equal Error Rate (EER) by 14% relatively, from 46.4% to 39.8%.},
keywords={},
doi={10.1587/transinf.E92.D.1716},
ISSN={1745-1361},
month={September},}
Copier
TY - JOUR
TI - An LVCSR Based Reading Miscue Detection System Using Knowledge of Reference and Error Patterns
T2 - IEICE TRANSACTIONS on Information
SP - 1716
EP - 1724
AU - Changliang LIU
AU - Fuping PAN
AU - Fengpei GE
AU - Bin DONG
AU - Hongbin SUO
AU - Yonghong YAN
PY - 2009
DO - 10.1587/transinf.E92.D.1716
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E92-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2009
AB - This paper describes a reading miscue detection system based on the conventional Large Vocabulary Continuous Speech Recognition (LVCSR) framework [1]. In order to incorporate the knowledge of reference (what the reader ought to read) and some error patterns into the decoding process, two methods are proposed: Dynamic Multiple Pronunciation Incorporation (DMPI) and Dynamic Interpolation of Language Model (DILM). DMPI dynamically adds some pronunciation variations into the search space to predict reading substitutions and insertions. To resolve the conflict between the coverage of error predications and the perplexity of the search space, only the pronunciation variants related to the reference are added. DILM dynamically interpolates the general language model based on the analysis of the reference and so keeps the active paths of decoding relatively near the reference. It makes the recognition more accurate, which further improves the detection performance. At the final stage of detection, an improved dynamic program (DP) is used to align the confusion network (CN) from speech recognition and the reference to generate the detecting result. The experimental results show that the proposed two methods can decrease the Equal Error Rate (EER) by 14% relatively, from 46.4% to 39.8%.
ER -