The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Nous proposons une technique de conversion vocale basée sur des segments utilisant la synthèse vocale basée sur un modèle de Markov caché (HMM) avec des données d'entraînement non parallèles. Dans la technique proposée, les informations phonémiques avec des durées et un contour F0 quantifié sont extraites de la parole d'entrée d'un locuteur source et sont transmises à une partie de synthèse. Dans la partie synthèse, les symboles F0 quantifiés sont utilisés comme contexte prosodique. Une séquence d'étiquettes phonétiquement et prosodiquement dépendante du contexte est générée à partir du phonème transmis et des symboles F0. Ensuite, la parole convertie est générée à partir de la séquence d'étiquettes avec des durées à l'aide des HMM dépendants du contexte pré-entraînés du locuteur cible. Dans la formation du modèle, les modèles des locuteurs source et cible peuvent être formés séparément, il n'est donc pas nécessaire de préparer des données vocales parallèles des locuteurs source et cible. Des résultats expérimentaux objectifs et subjectifs montrent que la conversion vocale basée sur des segments avec des contextes phonétiques et prosodiques fonctionne efficacement même si les données vocales parallèles ne sont pas disponibles.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Takashi NOSE, Yuhei OTA, Takao KOBAYASHI, "HMM-Based Voice Conversion Using Quantized F0 Context" in IEICE TRANSACTIONS on Information,
vol. E93-D, no. 9, pp. 2483-2490, September 2010, doi: 10.1587/transinf.E93.D.2483.
Abstract: We propose a segment-based voice conversion technique using hidden Markov model (HMM)-based speech synthesis with nonparallel training data. In the proposed technique, the phoneme information with durations and a quantized F0 contour are extracted from the input speech of a source speaker, and are transmitted to a synthesis part. In the synthesis part, the quantized F0 symbols are used as prosodic context. A phonetically and prosodically context-dependent label sequence is generated from the transmitted phoneme and the F0 symbols. Then, converted speech is generated from the label sequence with durations using the target speaker's pre-trained context-dependent HMMs. In the model training, the models of the source and target speakers can be trained separately, hence there is no need to prepare parallel speech data of the source and target speakers. Objective and subjective experimental results show that the segment-based voice conversion with phonetic and prosodic contexts works effectively even if the parallel speech data is not available.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E93.D.2483/_p
Copier
@ARTICLE{e93-d_9_2483,
author={Takashi NOSE, Yuhei OTA, Takao KOBAYASHI, },
journal={IEICE TRANSACTIONS on Information},
title={HMM-Based Voice Conversion Using Quantized F0 Context},
year={2010},
volume={E93-D},
number={9},
pages={2483-2490},
abstract={We propose a segment-based voice conversion technique using hidden Markov model (HMM)-based speech synthesis with nonparallel training data. In the proposed technique, the phoneme information with durations and a quantized F0 contour are extracted from the input speech of a source speaker, and are transmitted to a synthesis part. In the synthesis part, the quantized F0 symbols are used as prosodic context. A phonetically and prosodically context-dependent label sequence is generated from the transmitted phoneme and the F0 symbols. Then, converted speech is generated from the label sequence with durations using the target speaker's pre-trained context-dependent HMMs. In the model training, the models of the source and target speakers can be trained separately, hence there is no need to prepare parallel speech data of the source and target speakers. Objective and subjective experimental results show that the segment-based voice conversion with phonetic and prosodic contexts works effectively even if the parallel speech data is not available.},
keywords={},
doi={10.1587/transinf.E93.D.2483},
ISSN={1745-1361},
month={September},}
Copier
TY - JOUR
TI - HMM-Based Voice Conversion Using Quantized F0 Context
T2 - IEICE TRANSACTIONS on Information
SP - 2483
EP - 2490
AU - Takashi NOSE
AU - Yuhei OTA
AU - Takao KOBAYASHI
PY - 2010
DO - 10.1587/transinf.E93.D.2483
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E93-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2010
AB - We propose a segment-based voice conversion technique using hidden Markov model (HMM)-based speech synthesis with nonparallel training data. In the proposed technique, the phoneme information with durations and a quantized F0 contour are extracted from the input speech of a source speaker, and are transmitted to a synthesis part. In the synthesis part, the quantized F0 symbols are used as prosodic context. A phonetically and prosodically context-dependent label sequence is generated from the transmitted phoneme and the F0 symbols. Then, converted speech is generated from the label sequence with durations using the target speaker's pre-trained context-dependent HMMs. In the model training, the models of the source and target speakers can be trained separately, hence there is no need to prepare parallel speech data of the source and target speakers. Objective and subjective experimental results show that the segment-based voice conversion with phonetic and prosodic contexts works effectively even if the parallel speech data is not available.
ER -