The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Cet article présente une nouvelle approche de modélisation des spectres de parole et de la hauteur pour l'identification du locuteur indépendante du texte à l'aide de modèles de mélange gaussien basés sur la distribution de probabilité multi-espace (MSD-GMM). MSD-GMM nous permet de modéliser des valeurs de hauteur continues des images vocales et des symboles discrets pour les images non vocales dans un cadre unifié. Les caractéristiques spectrales et de hauteur sont modélisées conjointement par un MSD-GMM à deux flux. Nous dérivons des formules d'estimation du maximum de vraisemblance (ML) et une procédure de formation à l'erreur de classification minimale (MCE) pour les paramètres MSD-GMM. Les modèles de locuteurs MSD-GMM sont évalués pour des tâches d'identification du locuteur indépendantes du texte. Les résultats expérimentaux montrent que le MSD-GMM peut modéliser efficacement les caractéristiques spectrales et tonales de chaque haut-parleur et surpasse les modèles de haut-parleurs conventionnels. Les résultats démontrent également l'utilité de la formation MCE des paramètres MSD-GMM et la robustesse de la variabilité inter-session.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Chiyomi MIYAJIMA, Yosuke HATTORI, Keiichi TOKUDA, Takashi MASUKO, Takao KOBAYASHI, Tadashi KITAMURA, "Text-Independent Speaker Identification Using Gaussian Mixture Models Based on Multi-Space Probability Distribution" in IEICE TRANSACTIONS on Information,
vol. E84-D, no. 7, pp. 847-855, July 2001, doi: .
Abstract: This paper presents a new approach to modeling speech spectra and pitch for text-independent speaker identification using Gaussian mixture models based on multi-space probability distribution (MSD-GMM). MSD-GMM allows us to model continuous pitch values of voiced frames and discrete symbols for unvoiced frames in a unified framework. Spectral and pitch features are jointly modeled by a two-stream MSD-GMM. We derive maximum likelihood (ML) estimation formulae and minimum classification error (MCE) training procedure for MSD-GMM parameters. The MSD-GMM speaker models are evaluated for text-independent speaker identification tasks. The experimental results show that the MSD-GMM can efficiently model spectral and pitch features of each speaker and outperforms conventional speaker models. The results also demonstrate the utility of the MCE training of the MSD-GMM parameters and the robustness for the inter-session variability.
URL: https://global.ieice.org/en_transactions/information/10.1587/e84-d_7_847/_p
Copier
@ARTICLE{e84-d_7_847,
author={Chiyomi MIYAJIMA, Yosuke HATTORI, Keiichi TOKUDA, Takashi MASUKO, Takao KOBAYASHI, Tadashi KITAMURA, },
journal={IEICE TRANSACTIONS on Information},
title={Text-Independent Speaker Identification Using Gaussian Mixture Models Based on Multi-Space Probability Distribution},
year={2001},
volume={E84-D},
number={7},
pages={847-855},
abstract={This paper presents a new approach to modeling speech spectra and pitch for text-independent speaker identification using Gaussian mixture models based on multi-space probability distribution (MSD-GMM). MSD-GMM allows us to model continuous pitch values of voiced frames and discrete symbols for unvoiced frames in a unified framework. Spectral and pitch features are jointly modeled by a two-stream MSD-GMM. We derive maximum likelihood (ML) estimation formulae and minimum classification error (MCE) training procedure for MSD-GMM parameters. The MSD-GMM speaker models are evaluated for text-independent speaker identification tasks. The experimental results show that the MSD-GMM can efficiently model spectral and pitch features of each speaker and outperforms conventional speaker models. The results also demonstrate the utility of the MCE training of the MSD-GMM parameters and the robustness for the inter-session variability.},
keywords={},
doi={},
ISSN={},
month={July},}
Copier
TY - JOUR
TI - Text-Independent Speaker Identification Using Gaussian Mixture Models Based on Multi-Space Probability Distribution
T2 - IEICE TRANSACTIONS on Information
SP - 847
EP - 855
AU - Chiyomi MIYAJIMA
AU - Yosuke HATTORI
AU - Keiichi TOKUDA
AU - Takashi MASUKO
AU - Takao KOBAYASHI
AU - Tadashi KITAMURA
PY - 2001
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E84-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2001
AB - This paper presents a new approach to modeling speech spectra and pitch for text-independent speaker identification using Gaussian mixture models based on multi-space probability distribution (MSD-GMM). MSD-GMM allows us to model continuous pitch values of voiced frames and discrete symbols for unvoiced frames in a unified framework. Spectral and pitch features are jointly modeled by a two-stream MSD-GMM. We derive maximum likelihood (ML) estimation formulae and minimum classification error (MCE) training procedure for MSD-GMM parameters. The MSD-GMM speaker models are evaluated for text-independent speaker identification tasks. The experimental results show that the MSD-GMM can efficiently model spectral and pitch features of each speaker and outperforms conventional speaker models. The results also demonstrate the utility of the MCE training of the MSD-GMM parameters and the robustness for the inter-session variability.
ER -