The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Le paramètre d'enveloppe spectrale est un paramètre de parole important dans la qualité du vocodeur. Récemment, le Vector Quantized Variational AutoEncoder (VQ-VAE) est une méthode de quantification de bout en bout de pointe basée sur le modèle d'apprentissage en profondeur. Cet article propose une nouvelle technique pour améliorer l'apprentissage spatial d'intégration de VQ-VAE avec le Generative Adversarial Network pour quantifier le paramètre d'enveloppe spectrale, appelée VQ-VAE-EMGAN. Lors d'expériences, nous avons conçu le quantificateur pour les paramètres d'enveloppe spectrale du vocodeur WORLD extraits de la forme d'onde vocale à 16 kHz. Comme le montrent les résultats, la technique proposée a réduit la distorsion spectrale log (LSD) d'environ 0.5 dB et a augmenté le PESQ d'environ 0.17 en moyenne pour quatre opérations sur bits cibles par rapport au VQ-VAE conventionnel.
Tanasan SRIKOTR
Shibaura Institute of Technology
Kazunori MANO
Shibaura Institute of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Tanasan SRIKOTR, Kazunori MANO, "Vector Quantization of Speech Spectrum Based on the VQ-VAE Embedding Space Learning by GAN Technique" in IEICE TRANSACTIONS on Fundamentals,
vol. E105-A, no. 4, pp. 647-654, April 2022, doi: 10.1587/transfun.2021SMP0018.
Abstract: The spectral envelope parameter is a significant speech parameter in the vocoder's quality. Recently, the Vector Quantized Variational AutoEncoder (VQ-VAE) is a state-of-the-art end-to-end quantization method based on the deep learning model. This paper proposed a new technique for improving the embedding space learning of VQ-VAE with the Generative Adversarial Network for quantizing the spectral envelope parameter, called VQ-VAE-EMGAN. In experiments, we designed the quantizer for the spectral envelope parameters of the WORLD vocoder extracted from the 16kHz speech waveform. As the results shown, the proposed technique reduced the Log Spectral Distortion (LSD) around 0.5dB and increased the PESQ by around 0.17 on average for four target bit operations compared to the conventional VQ-VAE.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.2021SMP0018/_p
Copier
@ARTICLE{e105-a_4_647,
author={Tanasan SRIKOTR, Kazunori MANO, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Vector Quantization of Speech Spectrum Based on the VQ-VAE Embedding Space Learning by GAN Technique},
year={2022},
volume={E105-A},
number={4},
pages={647-654},
abstract={The spectral envelope parameter is a significant speech parameter in the vocoder's quality. Recently, the Vector Quantized Variational AutoEncoder (VQ-VAE) is a state-of-the-art end-to-end quantization method based on the deep learning model. This paper proposed a new technique for improving the embedding space learning of VQ-VAE with the Generative Adversarial Network for quantizing the spectral envelope parameter, called VQ-VAE-EMGAN. In experiments, we designed the quantizer for the spectral envelope parameters of the WORLD vocoder extracted from the 16kHz speech waveform. As the results shown, the proposed technique reduced the Log Spectral Distortion (LSD) around 0.5dB and increased the PESQ by around 0.17 on average for four target bit operations compared to the conventional VQ-VAE.},
keywords={},
doi={10.1587/transfun.2021SMP0018},
ISSN={1745-1337},
month={April},}
Copier
TY - JOUR
TI - Vector Quantization of Speech Spectrum Based on the VQ-VAE Embedding Space Learning by GAN Technique
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 647
EP - 654
AU - Tanasan SRIKOTR
AU - Kazunori MANO
PY - 2022
DO - 10.1587/transfun.2021SMP0018
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E105-A
IS - 4
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - April 2022
AB - The spectral envelope parameter is a significant speech parameter in the vocoder's quality. Recently, the Vector Quantized Variational AutoEncoder (VQ-VAE) is a state-of-the-art end-to-end quantization method based on the deep learning model. This paper proposed a new technique for improving the embedding space learning of VQ-VAE with the Generative Adversarial Network for quantizing the spectral envelope parameter, called VQ-VAE-EMGAN. In experiments, we designed the quantizer for the spectral envelope parameters of the WORLD vocoder extracted from the 16kHz speech waveform. As the results shown, the proposed technique reduced the Log Spectral Distortion (LSD) around 0.5dB and increased the PESQ by around 0.17 on average for four target bit operations compared to the conventional VQ-VAE.
ER -