The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Pour atténuer le problème de la dépendance à la quantité de données d'échantillon d'entraînement dans la reconnaissance des émotions vocales, un algorithme de pré-entraînement à gradient pondéré pour la reconnaissance des émotions vocales à faibles ressources est proposé. Plusieurs corpus d'émotions publiques sont utilisés pour la pré-formation afin de générer des paramètres de couche cachée partagée (SHL) avec la capacité de généralisation. Les paramètres sont utilisés pour initialiser le réseau en aval de la tâche de reconnaissance pour l'ensemble de données à faibles ressources, améliorant ainsi les performances de reconnaissance sur les corpus d'émotions à faibles ressources. Cependant, les catégories d'émotions sont différentes selon les corpus publics et le nombre d'échantillons varie considérablement, ce qui augmentera la difficulté de la formation conjointe sur plusieurs ensembles de données émotionnelles. À cette fin, un algorithme de gradient pondéré (WG) est proposé pour permettre à la couche partagée d'apprendre la représentation généralisée de différents ensembles de données sans affecter la priorité de la reconnaissance des émotions sur chaque corpus. Les expériences montrent que la précision est améliorée en utilisant CASIA, IEMOCAP et eNTERFACE comme ensembles de données connus pour pré-entraîner les modèles d'émotion de GEMEP, et que les performances pourraient être encore améliorées en combinant WG avec une couche d'inversion de gradient.
Yue XIE
Nanjing Institute of Technology
Ruiyu LIANG
Nanjing Institute of Technology
Xiaoyan ZHAO
Nanjing Institute of Technology
Zhenlin LIANG
Southeast University
Jing DU
Southeast University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Yue XIE, Ruiyu LIANG, Xiaoyan ZHAO, Zhenlin LIANG, Jing DU, "Weighted Gradient Pretrain for Low-Resource Speech Emotion Recognition" in IEICE TRANSACTIONS on Information,
vol. E105-D, no. 7, pp. 1352-1355, July 2022, doi: 10.1587/transinf.2022EDL8014.
Abstract: To alleviate the problem of the dependency on the quantity of the training sample data in speech emotion recognition, a weighted gradient pre-train algorithm for low-resource speech emotion recognition is proposed. Multiple public emotion corpora are used for pre-training to generate shared hidden layer (SHL) parameters with the generalization ability. The parameters are used to initialize the downsteam network of the recognition task for the low-resource dataset, thereby improving the recognition performance on low-resource emotion corpora. However, the emotion categories are different among the public corpora, and the number of samples varies greatly, which will increase the difficulty of joint training on multiple emotion datasets. To this end, a weighted gradient (WG) algorithm is proposed to enable the shared layer to learn the generalized representation of different datasets without affecting the priority of the emotion recognition on each corpus. Experiments show that the accuracy is improved by using CASIA, IEMOCAP, and eNTERFACE as the known datasets to pre-train the emotion models of GEMEP, and the performance could be improved further by combining WG with gradient reversal layer.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022EDL8014/_p
Copier
@ARTICLE{e105-d_7_1352,
author={Yue XIE, Ruiyu LIANG, Xiaoyan ZHAO, Zhenlin LIANG, Jing DU, },
journal={IEICE TRANSACTIONS on Information},
title={Weighted Gradient Pretrain for Low-Resource Speech Emotion Recognition},
year={2022},
volume={E105-D},
number={7},
pages={1352-1355},
abstract={To alleviate the problem of the dependency on the quantity of the training sample data in speech emotion recognition, a weighted gradient pre-train algorithm for low-resource speech emotion recognition is proposed. Multiple public emotion corpora are used for pre-training to generate shared hidden layer (SHL) parameters with the generalization ability. The parameters are used to initialize the downsteam network of the recognition task for the low-resource dataset, thereby improving the recognition performance on low-resource emotion corpora. However, the emotion categories are different among the public corpora, and the number of samples varies greatly, which will increase the difficulty of joint training on multiple emotion datasets. To this end, a weighted gradient (WG) algorithm is proposed to enable the shared layer to learn the generalized representation of different datasets without affecting the priority of the emotion recognition on each corpus. Experiments show that the accuracy is improved by using CASIA, IEMOCAP, and eNTERFACE as the known datasets to pre-train the emotion models of GEMEP, and the performance could be improved further by combining WG with gradient reversal layer.},
keywords={},
doi={10.1587/transinf.2022EDL8014},
ISSN={1745-1361},
month={July},}
Copier
TY - JOUR
TI - Weighted Gradient Pretrain for Low-Resource Speech Emotion Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 1352
EP - 1355
AU - Yue XIE
AU - Ruiyu LIANG
AU - Xiaoyan ZHAO
AU - Zhenlin LIANG
AU - Jing DU
PY - 2022
DO - 10.1587/transinf.2022EDL8014
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E105-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2022
AB - To alleviate the problem of the dependency on the quantity of the training sample data in speech emotion recognition, a weighted gradient pre-train algorithm for low-resource speech emotion recognition is proposed. Multiple public emotion corpora are used for pre-training to generate shared hidden layer (SHL) parameters with the generalization ability. The parameters are used to initialize the downsteam network of the recognition task for the low-resource dataset, thereby improving the recognition performance on low-resource emotion corpora. However, the emotion categories are different among the public corpora, and the number of samples varies greatly, which will increase the difficulty of joint training on multiple emotion datasets. To this end, a weighted gradient (WG) algorithm is proposed to enable the shared layer to learn the generalized representation of different datasets without affecting the priority of the emotion recognition on each corpus. Experiments show that the accuracy is improved by using CASIA, IEMOCAP, and eNTERFACE as the known datasets to pre-train the emotion models of GEMEP, and the performance could be improved further by combining WG with gradient reversal layer.
ER -