The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Cet article présente une nouvelle méthode d’amélioration de la parole œsophagienne à l’aide d’une conversion statistique de la voix. La parole œsophagienne est l’une des méthodes alternatives de parole pour les laryngectomisés. Bien que cela ne nécessite aucun appareil externe, les voix générées ne semblent généralement pas naturelles par rapport à la parole normale. Pour améliorer l'intelligibilité et le naturel de la parole œsophagienne, nous proposons une méthode de conversion vocale de la parole œsophagienne en parole normale. Un paramètre spectral et des paramètres d'excitation de la parole normale cible sont estimés séparément à partir d'un paramètre spectral de la parole œsophagienne sur la base de modèles de mélange gaussien. Les résultats expérimentaux démontrent que la méthode proposée apporte des améliorations significatives en termes d'intelligibilité et de naturel. Nous appliquons également une conversion d'eigenvoice un-à-plusieurs à l'amélioration de la parole œsophagienne pour permettre de contrôler de manière flexible la qualité vocale d'une parole améliorée.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Hironori DOI, Keigo NAKAMURA, Tomoki TODA, Hiroshi SARUWATARI, Kiyohiro SHIKANO, "Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models" in IEICE TRANSACTIONS on Information,
vol. E93-D, no. 9, pp. 2472-2482, September 2010, doi: 10.1587/transinf.E93.D.2472.
Abstract: This paper presents a novel method of enhancing esophageal speech using statistical voice conversion. Esophageal speech is one of the alternative speaking methods for laryngectomees. Although it doesn't require any external devices, generated voices usually sound unnatural compared with normal speech. To improve the intelligibility and naturalness of esophageal speech, we propose a voice conversion method from esophageal speech into normal speech. A spectral parameter and excitation parameters of target normal speech are separately estimated from a spectral parameter of the esophageal speech based on Gaussian mixture models. The experimental results demonstrate that the proposed method yields significant improvements in intelligibility and naturalness. We also apply one-to-many eigenvoice conversion to esophageal speech enhancement to make it possible to flexibly control the voice quality of enhanced speech.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E93.D.2472/_p
Copier
@ARTICLE{e93-d_9_2472,
author={Hironori DOI, Keigo NAKAMURA, Tomoki TODA, Hiroshi SARUWATARI, Kiyohiro SHIKANO, },
journal={IEICE TRANSACTIONS on Information},
title={Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models},
year={2010},
volume={E93-D},
number={9},
pages={2472-2482},
abstract={This paper presents a novel method of enhancing esophageal speech using statistical voice conversion. Esophageal speech is one of the alternative speaking methods for laryngectomees. Although it doesn't require any external devices, generated voices usually sound unnatural compared with normal speech. To improve the intelligibility and naturalness of esophageal speech, we propose a voice conversion method from esophageal speech into normal speech. A spectral parameter and excitation parameters of target normal speech are separately estimated from a spectral parameter of the esophageal speech based on Gaussian mixture models. The experimental results demonstrate that the proposed method yields significant improvements in intelligibility and naturalness. We also apply one-to-many eigenvoice conversion to esophageal speech enhancement to make it possible to flexibly control the voice quality of enhanced speech.},
keywords={},
doi={10.1587/transinf.E93.D.2472},
ISSN={1745-1361},
month={September},}
Copier
TY - JOUR
TI - Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models
T2 - IEICE TRANSACTIONS on Information
SP - 2472
EP - 2482
AU - Hironori DOI
AU - Keigo NAKAMURA
AU - Tomoki TODA
AU - Hiroshi SARUWATARI
AU - Kiyohiro SHIKANO
PY - 2010
DO - 10.1587/transinf.E93.D.2472
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E93-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2010
AB - This paper presents a novel method of enhancing esophageal speech using statistical voice conversion. Esophageal speech is one of the alternative speaking methods for laryngectomees. Although it doesn't require any external devices, generated voices usually sound unnatural compared with normal speech. To improve the intelligibility and naturalness of esophageal speech, we propose a voice conversion method from esophageal speech into normal speech. A spectral parameter and excitation parameters of target normal speech are separately estimated from a spectral parameter of the esophageal speech based on Gaussian mixture models. The experimental results demonstrate that the proposed method yields significant improvements in intelligibility and naturalness. We also apply one-to-many eigenvoice conversion to esophageal speech enhancement to make it possible to flexibly control the voice quality of enhanced speech.
ER -