The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Le parallélisme des données est la méthode dominante utilisée pour entraîner des modèles d'apprentissage profond (DL) sur des systèmes de calcul haute performance tels que des clusters GPU à grande échelle. Lors de la formation d'un modèle DL sur un grand nombre de nœuds, la communication entre nœuds devient un goulot d'étranglement en raison de sa latence relativement plus élevée et de sa bande passante de liaison inférieure (que la communication intra-nœud). Bien que certaines techniques de communication aient été proposées pour résoudre ce problème, toutes ces approches visent à résoudre le problème de la grande taille des messages tout en diminuant l'effet de la limitation du réseau inter-nœuds. Dans cette étude, nous étudions l'avantage d'augmenter la bande passante des liaisons inter-nœuds en utilisant des systèmes de commutation hybrides, c'est-à-dire la commutation de paquets électriques et la commutation de circuits optiques. Nous avons constaté que le transfert de données typique de la formation au parallélisme de données synchrone est de longue durée et rarement modifié et peut être accéléré grâce à la commutation optique. Les résultats de simulation sur le simulateur Simgrid montrent que notre approche accélère le temps de formation des applications de deep learning, notamment à grande échelle.
Thao-Nguyen TRUONG
National Institute of Advanced Industrial Science and Technology (AIST)
Ryousei TAKANO
National Institute of Advanced Industrial Science and Technology (AIST)
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Thao-Nguyen TRUONG, Ryousei TAKANO, "Hybrid Electrical/Optical Switch Architectures for Training Distributed Deep Learning in Large-Scale" in IEICE TRANSACTIONS on Information,
vol. E104-D, no. 8, pp. 1332-1339, August 2021, doi: 10.1587/transinf.2020EDP7201.
Abstract: Data parallelism is the dominant method used to train deep learning (DL) models on High-Performance Computing systems such as large-scale GPU clusters. When training a DL model on a large number of nodes, inter-node communication becomes bottle-neck due to its relatively higher latency and lower link bandwidth (than intra-node communication). Although some communication techniques have been proposed to cope with this problem, all of these approaches target to deal with the large message size issue while diminishing the effect of the limitation of the inter-node network. In this study, we investigate the benefit of increasing inter-node link bandwidth by using hybrid switching systems, i.e., Electrical Packet Switching and Optical Circuit Switching. We found that the typical data-transfer of synchronous data-parallelism training is long-lived and rarely changed that can be speed-up with optical switching. Simulation results on the Simgrid simulator show that our approach speed-up the training time of deep learning applications, especially in a large-scale manner.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2020EDP7201/_p
Copier
@ARTICLE{e104-d_8_1332,
author={Thao-Nguyen TRUONG, Ryousei TAKANO, },
journal={IEICE TRANSACTIONS on Information},
title={Hybrid Electrical/Optical Switch Architectures for Training Distributed Deep Learning in Large-Scale},
year={2021},
volume={E104-D},
number={8},
pages={1332-1339},
abstract={Data parallelism is the dominant method used to train deep learning (DL) models on High-Performance Computing systems such as large-scale GPU clusters. When training a DL model on a large number of nodes, inter-node communication becomes bottle-neck due to its relatively higher latency and lower link bandwidth (than intra-node communication). Although some communication techniques have been proposed to cope with this problem, all of these approaches target to deal with the large message size issue while diminishing the effect of the limitation of the inter-node network. In this study, we investigate the benefit of increasing inter-node link bandwidth by using hybrid switching systems, i.e., Electrical Packet Switching and Optical Circuit Switching. We found that the typical data-transfer of synchronous data-parallelism training is long-lived and rarely changed that can be speed-up with optical switching. Simulation results on the Simgrid simulator show that our approach speed-up the training time of deep learning applications, especially in a large-scale manner.},
keywords={},
doi={10.1587/transinf.2020EDP7201},
ISSN={1745-1361},
month={August},}
Copier
TY - JOUR
TI - Hybrid Electrical/Optical Switch Architectures for Training Distributed Deep Learning in Large-Scale
T2 - IEICE TRANSACTIONS on Information
SP - 1332
EP - 1339
AU - Thao-Nguyen TRUONG
AU - Ryousei TAKANO
PY - 2021
DO - 10.1587/transinf.2020EDP7201
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 8
JA - IEICE TRANSACTIONS on Information
Y1 - August 2021
AB - Data parallelism is the dominant method used to train deep learning (DL) models on High-Performance Computing systems such as large-scale GPU clusters. When training a DL model on a large number of nodes, inter-node communication becomes bottle-neck due to its relatively higher latency and lower link bandwidth (than intra-node communication). Although some communication techniques have been proposed to cope with this problem, all of these approaches target to deal with the large message size issue while diminishing the effect of the limitation of the inter-node network. In this study, we investigate the benefit of increasing inter-node link bandwidth by using hybrid switching systems, i.e., Electrical Packet Switching and Optical Circuit Switching. We found that the typical data-transfer of synchronous data-parallelism training is long-lived and rarely changed that can be speed-up with optical switching. Simulation results on the Simgrid simulator show that our approach speed-up the training time of deep learning applications, especially in a large-scale manner.
ER -