The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Avec le perfectionnement continu des réseaux de neurones profonds (DNN), une série de réseaux profonds et complexes tels que les réseaux résiduels (ResNets) font preuve d'une précision de prédiction impressionnante dans les tâches de classification d'images. Malheureusement, la complexité structurelle et le coût de calcul des réseaux résiduels rendent la mise en œuvre matérielle difficile. Dans cet article, nous présentons la technique de réseau neuronal profond quantifié et reconstruit (QR-DNN), qui insère d'abord des couches de normalisation par lots (BN) dans le réseau pendant la formation, puis les supprime pour faciliter une mise en œuvre matérielle efficace. De plus, un accélérateur de réseau résiduel (ARN) précis et efficace est présenté sur la base de QR-DNN avec des structures et des poids sans normalisation par lots représentés dans un système de nombres logarithmiques. L'ARN utilise une architecture de réseau systolique pour effectuer des opérations de décalage et d'accumulation au lieu d'opérations de multiplication. Il a été démontré que QR-DNN permet d'obtenir une amélioration de 1 à 2 % de la précision par rapport aux techniques existantes et de l'ARN par rapport aux meilleurs accélérateurs à virgule fixe précédents. Une implémentation FPGA sur un appareil Xilinx Zynq XC7Z045 atteint 804.03 GOPS, 104.15 FPS et une précision top 91.41 de 5 % pour le benchmark ResNet-50, et des résultats de pointe sont également rapportés pour AlexNet et VGG.
Cheng LUO
Fudan University
Wei CAO
Fudan University
Lingli WANG
Fudan University
Philip H. W. LEONG
University of Sydney
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Cheng LUO, Wei CAO, Lingli WANG, Philip H. W. LEONG, "RNA: An Accurate Residual Network Accelerator for Quantized and Reconstructed Deep Neural Networks" in IEICE TRANSACTIONS on Information,
vol. E102-D, no. 5, pp. 1037-1045, May 2019, doi: 10.1587/transinf.2018RCP0008.
Abstract: With the continuous refinement of Deep Neural Networks (DNNs), a series of deep and complex networks such as Residual Networks (ResNets) show impressive prediction accuracy in image classification tasks. Unfortunately, the structural complexity and computational cost of residual networks make hardware implementation difficult. In this paper, we present the quantized and reconstructed deep neural network (QR-DNN) technique, which first inserts batch normalization (BN) layers in the network during training, and later removes them to facilitate efficient hardware implementation. Moreover, an accurate and efficient residual network accelerator (RNA) is presented based on QR-DNN with batch-normalization-free structures and weights represented in a logarithmic number system. RNA employs a systolic array architecture to perform shift-and-accumulate operations instead of multiplication operations. QR-DNN is shown to achieve a 1∼2% improvement in accuracy over existing techniques, and RNA over previous best fixed-point accelerators. An FPGA implementation on a Xilinx Zynq XC7Z045 device achieves 804.03 GOPS, 104.15 FPS and 91.41% top-5 accuracy for the ResNet-50 benchmark, and state-of-the-art results are also reported for AlexNet and VGG.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018RCP0008/_p
Copier
@ARTICLE{e102-d_5_1037,
author={Cheng LUO, Wei CAO, Lingli WANG, Philip H. W. LEONG, },
journal={IEICE TRANSACTIONS on Information},
title={RNA: An Accurate Residual Network Accelerator for Quantized and Reconstructed Deep Neural Networks},
year={2019},
volume={E102-D},
number={5},
pages={1037-1045},
abstract={With the continuous refinement of Deep Neural Networks (DNNs), a series of deep and complex networks such as Residual Networks (ResNets) show impressive prediction accuracy in image classification tasks. Unfortunately, the structural complexity and computational cost of residual networks make hardware implementation difficult. In this paper, we present the quantized and reconstructed deep neural network (QR-DNN) technique, which first inserts batch normalization (BN) layers in the network during training, and later removes them to facilitate efficient hardware implementation. Moreover, an accurate and efficient residual network accelerator (RNA) is presented based on QR-DNN with batch-normalization-free structures and weights represented in a logarithmic number system. RNA employs a systolic array architecture to perform shift-and-accumulate operations instead of multiplication operations. QR-DNN is shown to achieve a 1∼2% improvement in accuracy over existing techniques, and RNA over previous best fixed-point accelerators. An FPGA implementation on a Xilinx Zynq XC7Z045 device achieves 804.03 GOPS, 104.15 FPS and 91.41% top-5 accuracy for the ResNet-50 benchmark, and state-of-the-art results are also reported for AlexNet and VGG.},
keywords={},
doi={10.1587/transinf.2018RCP0008},
ISSN={1745-1361},
month={May},}
Copier
TY - JOUR
TI - RNA: An Accurate Residual Network Accelerator for Quantized and Reconstructed Deep Neural Networks
T2 - IEICE TRANSACTIONS on Information
SP - 1037
EP - 1045
AU - Cheng LUO
AU - Wei CAO
AU - Lingli WANG
AU - Philip H. W. LEONG
PY - 2019
DO - 10.1587/transinf.2018RCP0008
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2019
AB - With the continuous refinement of Deep Neural Networks (DNNs), a series of deep and complex networks such as Residual Networks (ResNets) show impressive prediction accuracy in image classification tasks. Unfortunately, the structural complexity and computational cost of residual networks make hardware implementation difficult. In this paper, we present the quantized and reconstructed deep neural network (QR-DNN) technique, which first inserts batch normalization (BN) layers in the network during training, and later removes them to facilitate efficient hardware implementation. Moreover, an accurate and efficient residual network accelerator (RNA) is presented based on QR-DNN with batch-normalization-free structures and weights represented in a logarithmic number system. RNA employs a systolic array architecture to perform shift-and-accumulate operations instead of multiplication operations. QR-DNN is shown to achieve a 1∼2% improvement in accuracy over existing techniques, and RNA over previous best fixed-point accelerators. An FPGA implementation on a Xilinx Zynq XC7Z045 device achieves 804.03 GOPS, 104.15 FPS and 91.41% top-5 accuracy for the ResNet-50 benchmark, and state-of-the-art results are also reported for AlexNet and VGG.
ER -