The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Cet article décrit une méthode pour normaliser la position des lèvres afin d'améliorer les performances d'un système de reconnaissance vocale basé sur des informations visuelles. Fondamentalement, il existe deux types d'informations utiles dans les processus de reconnaissance vocale ; le premier est le signal vocal lui-même et le second est l’information visuelle provenant des lèvres en mouvement. Cet article tente de résoudre certains problèmes causés par l'utilisation d'images de lèvres en mouvement, tels que l'effet produit par la variation de l'emplacement des lèvres. La méthode de normalisation de la localisation des lèvres proposée est basée sur un algorithme de recherche de la position des lèvres dans lequel la normalisation de la localisation est intégrée dans l'apprentissage du modèle. Des expériences de reconnaissance de mots isolés indépendantes du locuteur ont été réalisées sur les bases de données Tulips1 et M2VTS. Les expériences ont montré un taux de reconnaissance de 74.5 % et un taux de réduction des erreurs de 35.7 % pour la base de données M2VTS de reconnaissance de mots à dix chiffres.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Oscar VANEGAS, Keiichi TOKUDA, Tadashi KITAMURA, "Lip Location Normalized Training for Visual Speech Recognition" in IEICE TRANSACTIONS on Information,
vol. E83-D, no. 11, pp. 1969-1977, November 2000, doi: .
Abstract: This paper describes a method to normalize the lip position for improving the performance of a visual-information-based speech recognition system. Basically, there are two types of information useful in speech recognition processes; the first one is the speech signal itself and the second one is the visual information from the lips in motion. This paper tries to solve some problems caused by using images from the lips in motion such as the effect produced by the variation of the lip location. The proposed lip location normalization method is based on a search algorithm of the lip position in which the location normalization is integrated into the model training. Experiments of speaker-independent isolated word recognition were carried out on the Tulips1 and M2VTS databases. Experiments showed a recognition rate of 74.5% and an error reduction rate of 35.7% for the ten digits word recognition M2VTS database.
URL: https://global.ieice.org/en_transactions/information/10.1587/e83-d_11_1969/_p
Copier
@ARTICLE{e83-d_11_1969,
author={Oscar VANEGAS, Keiichi TOKUDA, Tadashi KITAMURA, },
journal={IEICE TRANSACTIONS on Information},
title={Lip Location Normalized Training for Visual Speech Recognition},
year={2000},
volume={E83-D},
number={11},
pages={1969-1977},
abstract={This paper describes a method to normalize the lip position for improving the performance of a visual-information-based speech recognition system. Basically, there are two types of information useful in speech recognition processes; the first one is the speech signal itself and the second one is the visual information from the lips in motion. This paper tries to solve some problems caused by using images from the lips in motion such as the effect produced by the variation of the lip location. The proposed lip location normalization method is based on a search algorithm of the lip position in which the location normalization is integrated into the model training. Experiments of speaker-independent isolated word recognition were carried out on the Tulips1 and M2VTS databases. Experiments showed a recognition rate of 74.5% and an error reduction rate of 35.7% for the ten digits word recognition M2VTS database.},
keywords={},
doi={},
ISSN={},
month={November},}
Copier
TY - JOUR
TI - Lip Location Normalized Training for Visual Speech Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 1969
EP - 1977
AU - Oscar VANEGAS
AU - Keiichi TOKUDA
AU - Tadashi KITAMURA
PY - 2000
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E83-D
IS - 11
JA - IEICE TRANSACTIONS on Information
Y1 - November 2000
AB - This paper describes a method to normalize the lip position for improving the performance of a visual-information-based speech recognition system. Basically, there are two types of information useful in speech recognition processes; the first one is the speech signal itself and the second one is the visual information from the lips in motion. This paper tries to solve some problems caused by using images from the lips in motion such as the effect produced by the variation of the lip location. The proposed lip location normalization method is based on a search algorithm of the lip position in which the location normalization is integrated into the model training. Experiments of speaker-independent isolated word recognition were carried out on the Tulips1 and M2VTS databases. Experiments showed a recognition rate of 74.5% and an error reduction rate of 35.7% for the ten digits word recognition M2VTS database.
ER -