The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Les réseaux de neurones récurrents (RNN) se sont révélés efficaces pour les tâches basées sur des séquences grâce à leur capacité à traiter des informations temporelles. Dans les systèmes du monde réel, les RNN profonds sont plus largement utilisés pour résoudre des tâches complexes telles que la reconnaissance vocale à grande échelle et la traduction automatique. Cependant, la mise en œuvre de RNN profonds sur les plates-formes matérielles traditionnelles est inefficace en raison de la dépendance temporelle à longue portée et des modèles de calcul irréguliers au sein des RNN. Cette inefficacité se manifeste par l’augmentation proportionnelle de la latence d’inférence RNN par rapport au nombre de couches de RNN profonds sur les CPU et GPU. Les travaux antérieurs se sont principalement concentrés sur l’optimisation et l’accélération des cellules RNN individuelles. Pour rendre l'inférence RNN profonde rapide et efficace, nous proposons un accélérateur basé sur une plateforme multi-FPGA appelée Flow-in-Cloud (FiC). Dans ce travail, nous montrons que le parallélisme fourni par le système multi-FPGA peut être exploité pour étendre l'inférence de RNN profonds, en partitionnant un grand modèle sur plusieurs FPGA, de sorte que la latence reste proche de constante par rapport à nombre croissant de couches RNN. Pour les RNN monocouche et quatre couches, notre implémentation atteint une accélération 31x et 61x par rapport à un processeur Intel.
Yuxi SUN
Keio University
Hideharu AMANO
Keio University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Yuxi SUN, Hideharu AMANO, "FiC-RNN: A Multi-FPGA Acceleration Framework for Deep Recurrent Neural Networks" in IEICE TRANSACTIONS on Information,
vol. E103-D, no. 12, pp. 2457-2462, December 2020, doi: 10.1587/transinf.2020PAP0003.
Abstract: Recurrent neural networks (RNNs) have been proven effective for sequence-based tasks thanks to their capability to process temporal information. In real-world systems, deep RNNs are more widely used to solve complicated tasks such as large-scale speech recognition and machine translation. However, the implementation of deep RNNs on traditional hardware platforms is inefficient due to long-range temporal dependence and irregular computation patterns within RNNs. This inefficiency manifests itself in the proportional increase in the latency of RNN inference with respect to the number of layers of deep RNNs on CPUs and GPUs. Previous work has focused mostly on optimizing and accelerating individual RNN cells. To make deep RNN inference fast and efficient, we propose an accelerator based on a multi-FPGA platform called Flow-in-Cloud (FiC). In this work, we show that the parallelism provided by the multi-FPGA system can be taken advantage of to scale up the inference of deep RNNs, by partitioning a large model onto several FPGAs, so that the latency stays close to constant with respect to increasing number of RNN layers. For single-layer and four-layer RNNs, our implementation achieves 31x and 61x speedup compared with an Intel CPU.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2020PAP0003/_p
Copier
@ARTICLE{e103-d_12_2457,
author={Yuxi SUN, Hideharu AMANO, },
journal={IEICE TRANSACTIONS on Information},
title={FiC-RNN: A Multi-FPGA Acceleration Framework for Deep Recurrent Neural Networks},
year={2020},
volume={E103-D},
number={12},
pages={2457-2462},
abstract={Recurrent neural networks (RNNs) have been proven effective for sequence-based tasks thanks to their capability to process temporal information. In real-world systems, deep RNNs are more widely used to solve complicated tasks such as large-scale speech recognition and machine translation. However, the implementation of deep RNNs on traditional hardware platforms is inefficient due to long-range temporal dependence and irregular computation patterns within RNNs. This inefficiency manifests itself in the proportional increase in the latency of RNN inference with respect to the number of layers of deep RNNs on CPUs and GPUs. Previous work has focused mostly on optimizing and accelerating individual RNN cells. To make deep RNN inference fast and efficient, we propose an accelerator based on a multi-FPGA platform called Flow-in-Cloud (FiC). In this work, we show that the parallelism provided by the multi-FPGA system can be taken advantage of to scale up the inference of deep RNNs, by partitioning a large model onto several FPGAs, so that the latency stays close to constant with respect to increasing number of RNN layers. For single-layer and four-layer RNNs, our implementation achieves 31x and 61x speedup compared with an Intel CPU.},
keywords={},
doi={10.1587/transinf.2020PAP0003},
ISSN={1745-1361},
month={December},}
Copier
TY - JOUR
TI - FiC-RNN: A Multi-FPGA Acceleration Framework for Deep Recurrent Neural Networks
T2 - IEICE TRANSACTIONS on Information
SP - 2457
EP - 2462
AU - Yuxi SUN
AU - Hideharu AMANO
PY - 2020
DO - 10.1587/transinf.2020PAP0003
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E103-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2020
AB - Recurrent neural networks (RNNs) have been proven effective for sequence-based tasks thanks to their capability to process temporal information. In real-world systems, deep RNNs are more widely used to solve complicated tasks such as large-scale speech recognition and machine translation. However, the implementation of deep RNNs on traditional hardware platforms is inefficient due to long-range temporal dependence and irregular computation patterns within RNNs. This inefficiency manifests itself in the proportional increase in the latency of RNN inference with respect to the number of layers of deep RNNs on CPUs and GPUs. Previous work has focused mostly on optimizing and accelerating individual RNN cells. To make deep RNN inference fast and efficient, we propose an accelerator based on a multi-FPGA platform called Flow-in-Cloud (FiC). In this work, we show that the parallelism provided by the multi-FPGA system can be taken advantage of to scale up the inference of deep RNNs, by partitioning a large model onto several FPGAs, so that the latency stays close to constant with respect to increasing number of RNN layers. For single-layer and four-layer RNNs, our implementation achieves 31x and 61x speedup compared with an Intel CPU.
ER -