The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Une série de réseaux neuronaux binarisés (BNN) démontrent la précision acceptée dans les tâches de classification d'images et atteignent d'excellentes performances sur les réseaux prédiffusés programmables sur site (FPGA). Néanmoins, nous observons que les conceptions existantes de BNN prennent beaucoup de temps pour changer le BNN cible et accélérer la création d’un nouveau BNN. Par conséquent, cet article présente FCA-BNN, un accélérateur flexible et configurable, qui utilise la technique configurable au niveau de la couche pour exécuter de manière transparente chaque couche du BNN cible. Initialement, pour économiser les ressources et améliorer l'efficacité énergétique, les formules optimales orientées matériel sont introduites pour concevoir une matrice informatique économe en énergie pour différentes tailles de couches à convolution rembourrée et entièrement connectées. De plus, pour accélérer efficacement les BNN cibles, nous exploitons le modèle analytique pour explorer les paramètres de conception optimaux pour FCA-BNN. Enfin, le flux de mappage proposé modifie le réseau cible en entrant dans l'ordre et accélère un nouveau réseau en compilant et en chargeant les instructions correspondantes, sans charger ni générer de flux binaire. Les évaluations de trois structures majeures de BNN montrent que les différences entre la précision d'inférence de FCA-BNN et celle du GPU ne sont que de 0.07 %, 0.31 % et 0.4 % pour LFC, VGG-like et Cifar-10 AlexNet. De plus, nos résultats en matière d'efficacité énergétique atteignent les résultats des accélérateurs FPGA personnalisés existants de 0.8× pour le LFC et de 2.6× pour le type VGG. Pour Cifar-10 AlexNet, FCA-BNN atteint respectivement 188.2× et 60.6× de mieux que le CPU et le GPU en termes d’efficacité énergétique. Au meilleur de nos connaissances, FCA-BNN est la conception la plus efficace pour le changement du BNN cible et l'accélération d'un nouveau BNN, tout en conservant des performances compétitives.
Jiabao GAO
Fudan University
Yuchen YAO
Fudan University
Zhengjie LI
Fudan University
Jinmei LAI
Fudan University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Jiabao GAO, Yuchen YAO, Zhengjie LI, Jinmei LAI, "FCA-BNN: Flexible and Configurable Accelerator for Binarized Neural Networks on FPGA" in IEICE TRANSACTIONS on Information,
vol. E104-D, no. 8, pp. 1367-1377, August 2021, doi: 10.1587/transinf.2021EDP7054.
Abstract: A series of Binarized Neural Networks (BNNs) show the accepted accuracy in image classification tasks and achieve the excellent performance on field programmable gate array (FPGA). Nevertheless, we observe existing designs of BNNs are quite time-consuming in change of the target BNN and acceleration of a new BNN. Therefore, this paper presents FCA-BNN, a flexible and configurable accelerator, which employs the layer-level configurable technique to execute seamlessly each layer of target BNN. Initially, to save resource and improve energy efficiency, the hardware-oriented optimal formulas are introduced to design energy-efficient computing array for different sizes of padded-convolution and fully-connected layers. Moreover, to accelerate the target BNNs efficiently, we exploit the analytical model to explore the optimal design parameters for FCA-BNN. Finally, our proposed mapping flow changes the target network by entering order, and accelerates a new network by compiling and loading corresponding instructions, while without loading and generating bitstream. The evaluations on three major structures of BNNs show the differences between inference accuracy of FCA-BNN and that of GPU are just 0.07%, 0.31% and 0.4% for LFC, VGG-like and Cifar-10 AlexNet. Furthermore, our energy-efficiency results achieve the results of existing customized FPGA accelerators by 0.8× for LFC and 2.6× for VGG-like. For Cifar-10 AlexNet, FCA-BNN achieves 188.2× and 60.6× better than CPU and GPU in energy efficiency, respectively. To the best of our knowledge, FCA-BNN is the most efficient design for change of the target BNN and acceleration of a new BNN, while keeps the competitive performance.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2021EDP7054/_p
Copier
@ARTICLE{e104-d_8_1367,
author={Jiabao GAO, Yuchen YAO, Zhengjie LI, Jinmei LAI, },
journal={IEICE TRANSACTIONS on Information},
title={FCA-BNN: Flexible and Configurable Accelerator for Binarized Neural Networks on FPGA},
year={2021},
volume={E104-D},
number={8},
pages={1367-1377},
abstract={A series of Binarized Neural Networks (BNNs) show the accepted accuracy in image classification tasks and achieve the excellent performance on field programmable gate array (FPGA). Nevertheless, we observe existing designs of BNNs are quite time-consuming in change of the target BNN and acceleration of a new BNN. Therefore, this paper presents FCA-BNN, a flexible and configurable accelerator, which employs the layer-level configurable technique to execute seamlessly each layer of target BNN. Initially, to save resource and improve energy efficiency, the hardware-oriented optimal formulas are introduced to design energy-efficient computing array for different sizes of padded-convolution and fully-connected layers. Moreover, to accelerate the target BNNs efficiently, we exploit the analytical model to explore the optimal design parameters for FCA-BNN. Finally, our proposed mapping flow changes the target network by entering order, and accelerates a new network by compiling and loading corresponding instructions, while without loading and generating bitstream. The evaluations on three major structures of BNNs show the differences between inference accuracy of FCA-BNN and that of GPU are just 0.07%, 0.31% and 0.4% for LFC, VGG-like and Cifar-10 AlexNet. Furthermore, our energy-efficiency results achieve the results of existing customized FPGA accelerators by 0.8× for LFC and 2.6× for VGG-like. For Cifar-10 AlexNet, FCA-BNN achieves 188.2× and 60.6× better than CPU and GPU in energy efficiency, respectively. To the best of our knowledge, FCA-BNN is the most efficient design for change of the target BNN and acceleration of a new BNN, while keeps the competitive performance.},
keywords={},
doi={10.1587/transinf.2021EDP7054},
ISSN={1745-1361},
month={August},}
Copier
TY - JOUR
TI - FCA-BNN: Flexible and Configurable Accelerator for Binarized Neural Networks on FPGA
T2 - IEICE TRANSACTIONS on Information
SP - 1367
EP - 1377
AU - Jiabao GAO
AU - Yuchen YAO
AU - Zhengjie LI
AU - Jinmei LAI
PY - 2021
DO - 10.1587/transinf.2021EDP7054
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 8
JA - IEICE TRANSACTIONS on Information
Y1 - August 2021
AB - A series of Binarized Neural Networks (BNNs) show the accepted accuracy in image classification tasks and achieve the excellent performance on field programmable gate array (FPGA). Nevertheless, we observe existing designs of BNNs are quite time-consuming in change of the target BNN and acceleration of a new BNN. Therefore, this paper presents FCA-BNN, a flexible and configurable accelerator, which employs the layer-level configurable technique to execute seamlessly each layer of target BNN. Initially, to save resource and improve energy efficiency, the hardware-oriented optimal formulas are introduced to design energy-efficient computing array for different sizes of padded-convolution and fully-connected layers. Moreover, to accelerate the target BNNs efficiently, we exploit the analytical model to explore the optimal design parameters for FCA-BNN. Finally, our proposed mapping flow changes the target network by entering order, and accelerates a new network by compiling and loading corresponding instructions, while without loading and generating bitstream. The evaluations on three major structures of BNNs show the differences between inference accuracy of FCA-BNN and that of GPU are just 0.07%, 0.31% and 0.4% for LFC, VGG-like and Cifar-10 AlexNet. Furthermore, our energy-efficiency results achieve the results of existing customized FPGA accelerators by 0.8× for LFC and 2.6× for VGG-like. For Cifar-10 AlexNet, FCA-BNN achieves 188.2× and 60.6× better than CPU and GPU in energy efficiency, respectively. To the best of our knowledge, FCA-BNN is the most efficient design for change of the target BNN and acceleration of a new BNN, while keeps the competitive performance.
ER -