The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Ces dernières années, l’apprentissage profond a obtenu des résultats significatifs dans divers domaines de l’apprentissage automatique. L'apprentissage profond nécessite une énorme quantité de données pour former un modèle, et des techniques de collecte de données telles que l'exploration du Web ont été développées. Cependant, il existe un risque que ces techniques de collecte de données génèrent des étiquettes incorrectes. Si un modèle d'apprentissage profond pour la classification d'images est formé sur un ensemble de données avec des étiquettes bruitées, les performances de généralisation diminuent considérablement. Ce problème s'appelle Learning with Noisy Labels (LNL). L'une des recherches récentes sur LNL, appelée DivideMix [1], a réussi à diviser l'ensemble de données en échantillons avec des étiquettes propres et en échantillons avec des étiquettes bruyantes en modélisant la distribution des pertes de tous les échantillons d'apprentissage avec un modèle gaussien de mélange à deux composants (GMM). Ensuite, il traite l'ensemble de données divisé en échantillons étiquetés et non étiquetés et entraîne le modèle de classification de manière semi-supervisée. Étant donné que les échantillons sélectionnés ont des valeurs de perte plus faibles et sont faciles à classer, les modèles d'entraînement risquent d'être surajustés au modèle simple pendant l'entraînement. Pour entraîner le modèle de classification sans surajustement aux modèles simples, nous proposons d'introduire régularisation de cohérence sur les échantillons sélectionnés par GMM. La régularisation de cohérence perturbe les images d'entrée et encourage le modèle à fournir la même valeur aux images perturbées et aux images originales. Le modèle de classification reçoit simultanément les échantillons sélectionnés comme propres et leurs échantillons perturbés, et atteint des performances de généralisation plus élevées avec moins de surajustement aux échantillons sélectionnés. Nous avons évalué notre méthode avec des étiquettes bruyantes générées synthétiquement sur CIFAR-10 et CIFAR-100 et avons obtenu des résultats comparables ou meilleurs que la méthode de pointe.
Yuichiro NOMURA
Hiroshima University
Takio KURITA
Hiroshima University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Yuichiro NOMURA, Takio KURITA, "Consistency Regularization on Clean Samples for Learning with Noisy Labels" in IEICE TRANSACTIONS on Information,
vol. E105-D, no. 2, pp. 387-395, February 2022, doi: 10.1587/transinf.2021EDP7127.
Abstract: In the recent years, deep learning has achieved significant results in various areas of machine learning. Deep learning requires a huge amount of data to train a model, and data collection techniques such as web crawling have been developed. However, there is a risk that these data collection techniques may generate incorrect labels. If a deep learning model for image classification is trained on a dataset with noisy labels, the generalization performance significantly decreases. This problem is called Learning with Noisy Labels (LNL). One of the recent researches on LNL, called DivideMix [1], has successfully divided the dataset into samples with clean labels and ones with noisy labels by modeling loss distribution of all training samples with a two-component Mixture Gaussian model (GMM). Then it treats the divided dataset as labeled and unlabeled samples and trains the classification model in a semi-supervised manner. Since the selected samples have lower loss values and are easy to classify, training models are in a risk of overfitting to the simple pattern during training. To train the classification model without overfitting to the simple patterns, we propose to introduce consistency regularization on the selected samples by GMM. The consistency regularization perturbs input images and encourages model to outputs the same value to the perturbed images and the original images. The classification model simultaneously receives the samples selected as clean and their perturbed ones, and it achieves higher generalization performance with less overfitting to the selected samples. We evaluated our method with synthetically generated noisy labels on CIFAR-10 and CIFAR-100 and obtained results that are comparable or better than the state-of-the-art method.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2021EDP7127/_p
Copier
@ARTICLE{e105-d_2_387,
author={Yuichiro NOMURA, Takio KURITA, },
journal={IEICE TRANSACTIONS on Information},
title={Consistency Regularization on Clean Samples for Learning with Noisy Labels},
year={2022},
volume={E105-D},
number={2},
pages={387-395},
abstract={In the recent years, deep learning has achieved significant results in various areas of machine learning. Deep learning requires a huge amount of data to train a model, and data collection techniques such as web crawling have been developed. However, there is a risk that these data collection techniques may generate incorrect labels. If a deep learning model for image classification is trained on a dataset with noisy labels, the generalization performance significantly decreases. This problem is called Learning with Noisy Labels (LNL). One of the recent researches on LNL, called DivideMix [1], has successfully divided the dataset into samples with clean labels and ones with noisy labels by modeling loss distribution of all training samples with a two-component Mixture Gaussian model (GMM). Then it treats the divided dataset as labeled and unlabeled samples and trains the classification model in a semi-supervised manner. Since the selected samples have lower loss values and are easy to classify, training models are in a risk of overfitting to the simple pattern during training. To train the classification model without overfitting to the simple patterns, we propose to introduce consistency regularization on the selected samples by GMM. The consistency regularization perturbs input images and encourages model to outputs the same value to the perturbed images and the original images. The classification model simultaneously receives the samples selected as clean and their perturbed ones, and it achieves higher generalization performance with less overfitting to the selected samples. We evaluated our method with synthetically generated noisy labels on CIFAR-10 and CIFAR-100 and obtained results that are comparable or better than the state-of-the-art method.},
keywords={},
doi={10.1587/transinf.2021EDP7127},
ISSN={1745-1361},
month={February},}
Copier
TY - JOUR
TI - Consistency Regularization on Clean Samples for Learning with Noisy Labels
T2 - IEICE TRANSACTIONS on Information
SP - 387
EP - 395
AU - Yuichiro NOMURA
AU - Takio KURITA
PY - 2022
DO - 10.1587/transinf.2021EDP7127
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E105-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2022
AB - In the recent years, deep learning has achieved significant results in various areas of machine learning. Deep learning requires a huge amount of data to train a model, and data collection techniques such as web crawling have been developed. However, there is a risk that these data collection techniques may generate incorrect labels. If a deep learning model for image classification is trained on a dataset with noisy labels, the generalization performance significantly decreases. This problem is called Learning with Noisy Labels (LNL). One of the recent researches on LNL, called DivideMix [1], has successfully divided the dataset into samples with clean labels and ones with noisy labels by modeling loss distribution of all training samples with a two-component Mixture Gaussian model (GMM). Then it treats the divided dataset as labeled and unlabeled samples and trains the classification model in a semi-supervised manner. Since the selected samples have lower loss values and are easy to classify, training models are in a risk of overfitting to the simple pattern during training. To train the classification model without overfitting to the simple patterns, we propose to introduce consistency regularization on the selected samples by GMM. The consistency regularization perturbs input images and encourages model to outputs the same value to the perturbed images and the original images. The classification model simultaneously receives the samples selected as clean and their perturbed ones, and it achieves higher generalization performance with less overfitting to the selected samples. We evaluated our method with synthetically generated noisy labels on CIFAR-10 and CIFAR-100 and obtained results that are comparable or better than the state-of-the-art method.
ER -