The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Cet article suggère un système de vérification de l'énoncé utilisant le rapport de log-vraisemblance au niveau de l'état avec sélection de trame et d'état. Nous utilisons des modèles de Markov cachés pour la reconnaissance vocale et la vérification de l'énoncé comme modèles acoustiques et modèles anti-téléphoniques. Les modèles de Markov cachés ont trois états et chaque état représente différentes caractéristiques d'un téléphone. Ainsi, nous proposons un algorithme pour calculer le rapport de vraisemblance au niveau de l'état et attribuer des poids aux états afin d'obtenir une mesure de confiance plus fiable des téléphones reconnus. De plus, nous proposons un algorithme de sélection de trame pour calculer la mesure de confiance sur les trames incluant la parole appropriée dans la parole d'entrée. En général, les informations de segmentation téléphonique obtenues à partir d'un système de reconnaissance vocale indépendant du locuteur ne sont pas précises car les modèles acoustiques basés sur les triphones sont difficiles à entraîner efficacement pour couvrir divers effets de prononciation et de coarticulation. Il est donc plus difficile de trouver les bons états correspondants lors de l’obtention d’informations sur la segmentation des états. Un algorithme de sélection d'état est suggéré pour trouver des états valides. La méthode proposée utilisant le rapport de log-vraisemblance au niveau de l'État avec sélection de base et d'état montre que la réduction relative du taux d'erreur égal est de 18.1 % par rapport au système de base utilisant de simples rapports de log-vraisemblance au niveau du téléphone.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Suk-Bong KWON, Hoirin KIM, "Utterance Verification Using State-Level Log-Likelihood Ratio with Frame and State Selection" in IEICE TRANSACTIONS on Information,
vol. E93-D, no. 3, pp. 647-650, March 2010, doi: 10.1587/transinf.E93.D.647.
Abstract: This paper suggests utterance verification system using state-level log-likelihood ratio with frame and state selection. We use hidden Markov models for speech recognition and utterance verification as acoustic models and anti-phone models. The hidden Markov models have three states and each state represents different characteristics of a phone. Thus we propose an algorithm to compute state-level log-likelihood ratio and give weights on states for obtaining more reliable confidence measure of recognized phones. Additionally, we propose a frame selection algorithm to compute confidence measure on frames including proper speech in the input speech. In general, phone segmentation information obtained from speaker-independent speech recognition system is not accurate because triphone-based acoustic models are difficult to effectively train for covering diverse pronunciation and coarticulation effect. So, it is more difficult to find the right matched states when obtaining state segmentation information. A state selection algorithm is suggested for finding valid states. The proposed method using state-level log-likelihood ratio with frame and state selection shows that the relative reduction in equal error rate is 18.1% compared to the baseline system using simple phone-level log-likelihood ratios.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E93.D.647/_p
Copier
@ARTICLE{e93-d_3_647,
author={Suk-Bong KWON, Hoirin KIM, },
journal={IEICE TRANSACTIONS on Information},
title={Utterance Verification Using State-Level Log-Likelihood Ratio with Frame and State Selection},
year={2010},
volume={E93-D},
number={3},
pages={647-650},
abstract={This paper suggests utterance verification system using state-level log-likelihood ratio with frame and state selection. We use hidden Markov models for speech recognition and utterance verification as acoustic models and anti-phone models. The hidden Markov models have three states and each state represents different characteristics of a phone. Thus we propose an algorithm to compute state-level log-likelihood ratio and give weights on states for obtaining more reliable confidence measure of recognized phones. Additionally, we propose a frame selection algorithm to compute confidence measure on frames including proper speech in the input speech. In general, phone segmentation information obtained from speaker-independent speech recognition system is not accurate because triphone-based acoustic models are difficult to effectively train for covering diverse pronunciation and coarticulation effect. So, it is more difficult to find the right matched states when obtaining state segmentation information. A state selection algorithm is suggested for finding valid states. The proposed method using state-level log-likelihood ratio with frame and state selection shows that the relative reduction in equal error rate is 18.1% compared to the baseline system using simple phone-level log-likelihood ratios.},
keywords={},
doi={10.1587/transinf.E93.D.647},
ISSN={1745-1361},
month={March},}
Copier
TY - JOUR
TI - Utterance Verification Using State-Level Log-Likelihood Ratio with Frame and State Selection
T2 - IEICE TRANSACTIONS on Information
SP - 647
EP - 650
AU - Suk-Bong KWON
AU - Hoirin KIM
PY - 2010
DO - 10.1587/transinf.E93.D.647
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E93-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2010
AB - This paper suggests utterance verification system using state-level log-likelihood ratio with frame and state selection. We use hidden Markov models for speech recognition and utterance verification as acoustic models and anti-phone models. The hidden Markov models have three states and each state represents different characteristics of a phone. Thus we propose an algorithm to compute state-level log-likelihood ratio and give weights on states for obtaining more reliable confidence measure of recognized phones. Additionally, we propose a frame selection algorithm to compute confidence measure on frames including proper speech in the input speech. In general, phone segmentation information obtained from speaker-independent speech recognition system is not accurate because triphone-based acoustic models are difficult to effectively train for covering diverse pronunciation and coarticulation effect. So, it is more difficult to find the right matched states when obtaining state segmentation information. A state selection algorithm is suggested for finding valid states. The proposed method using state-level log-likelihood ratio with frame and state selection shows that the relative reduction in equal error rate is 18.1% compared to the baseline system using simple phone-level log-likelihood ratios.
ER -