The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Largement adoptée aujourd'hui par les applications d'apprentissage automatique et de traitement de graphiques, la multiplication matricielle-vecteur clairsemée (SpMV) est un algorithme très populaire en algèbre linéaire. C'est particulièrement le cas des couches MLP entièrement connectées, qui dominent de nombreux calculs SpMV et jouent un rôle important dans divers services. En conséquence, une grande partie des cycles des centres de données est consacrée aux noyaux SpMV. Pendant ce temps, malgré des options de stockage efficaces contre la rareté (telles que CSR ou CSC), les noyaux SpMV souffrent toujours du problème de bande passante mémoire limitée lors du transfert de données en raison de la hiérarchie de mémoire des systèmes informatiques modernes. De manière plus détaillée, nous constatons que les données entières et à virgule flottante utilisées dans les noyaux SpMV sont traitées clairement sans aucun prétraitement nécessaire. Par conséquent, nous pensons que les techniques de conservation de la bande passante, telles que la compression des données, peuvent considérablement aider les noyaux SpMV lorsque les données sont transférées entre la mémoire principale et le cache de dernier niveau (LLC). De plus, nous observons également que les conditions de convergence dans certains benchmarks de calcul scientifique typiques (basés sur les noyaux SpMV) ne seront pas dégradées lors de l'adoption de données à virgule flottante de moindre précision. Sur la base de ces résultats, nous proposons dans ce travail un schéma de compression de données simple mais efficace qui peut être étendu de préférence aux architectures informatiques à usage général ou aux systèmes HPC. Lorsqu'il est adopté, une accélération optimale de 1.92x est obtenue. En outre, les évaluations avec le noyau CG et l'algorithme PageRank indiquent que notre proposition introduit une surcharge négligeable à la fois sur la vitesse de convergence et sur la précision des résultats finaux.
Siyi HU
University of Tokyo
Makiko ITO
Fujitsu Ltd.
Takahide YOSHIKAWA
Fujitsu Ltd.
Yuan HE
Keio University
Hiroshi NAKAMURA
University of Tokyo
Masaaki KONDO
Keio University,RIKEN
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Siyi HU, Makiko ITO, Takahide YOSHIKAWA, Yuan HE, Hiroshi NAKAMURA, Masaaki KONDO, "Adaptive Lossy Data Compression Extended Architecture for Memory Bandwidth Conservation in SpMV" in IEICE TRANSACTIONS on Information,
vol. E106-D, no. 12, pp. 2015-2025, December 2023, doi: 10.1587/transinf.2023PAP0008.
Abstract: Widely adopted by machine learning and graph processing applications nowadays, sparse matrix-Vector multiplication (SpMV) is a very popular algorithm in linear algebra. This is especially the case for fully-connected MLP layers, which dominate many SpMV computations and play a substantial role in diverse services. As a consequence, a large fraction of data center cycles is spent on SpMV kernels. Meanwhile, despite having efficient storage options against sparsity (such as CSR or CSC), SpMV kernels still suffer from the problem of limited memory bandwidth during data transferring because of the memory hierarchy of modern computing systems. In more detail, we find that both integer and floating-point data used in SpMV kernels are handled plainly without any necessary pre-processing. Therefore, we believe bandwidth conservation techniques, such as data compression, may dramatically help SpMV kernels when data is transferred between the main memory and the Last Level Cache (LLC). Furthermore, we also observe that convergence conditions in some typical scientific computation benchmarks (based on SpMV kernels) will not be degraded when adopting lower precision floating-point data. Based on these findings, in this work, we propose a simple yet effective data compression scheme that can be extended to general purpose computing architectures or HPC systems preferably. When it is adopted, a best-case speedup of 1.92x is made. Besides, evaluations with both the CG kernel and the PageRank algorithm indicate that our proposal introduces negligible overhead on both the convergence speed and the accuracy of final results.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2023PAP0008/_p
Copier
@ARTICLE{e106-d_12_2015,
author={Siyi HU, Makiko ITO, Takahide YOSHIKAWA, Yuan HE, Hiroshi NAKAMURA, Masaaki KONDO, },
journal={IEICE TRANSACTIONS on Information},
title={Adaptive Lossy Data Compression Extended Architecture for Memory Bandwidth Conservation in SpMV},
year={2023},
volume={E106-D},
number={12},
pages={2015-2025},
abstract={Widely adopted by machine learning and graph processing applications nowadays, sparse matrix-Vector multiplication (SpMV) is a very popular algorithm in linear algebra. This is especially the case for fully-connected MLP layers, which dominate many SpMV computations and play a substantial role in diverse services. As a consequence, a large fraction of data center cycles is spent on SpMV kernels. Meanwhile, despite having efficient storage options against sparsity (such as CSR or CSC), SpMV kernels still suffer from the problem of limited memory bandwidth during data transferring because of the memory hierarchy of modern computing systems. In more detail, we find that both integer and floating-point data used in SpMV kernels are handled plainly without any necessary pre-processing. Therefore, we believe bandwidth conservation techniques, such as data compression, may dramatically help SpMV kernels when data is transferred between the main memory and the Last Level Cache (LLC). Furthermore, we also observe that convergence conditions in some typical scientific computation benchmarks (based on SpMV kernels) will not be degraded when adopting lower precision floating-point data. Based on these findings, in this work, we propose a simple yet effective data compression scheme that can be extended to general purpose computing architectures or HPC systems preferably. When it is adopted, a best-case speedup of 1.92x is made. Besides, evaluations with both the CG kernel and the PageRank algorithm indicate that our proposal introduces negligible overhead on both the convergence speed and the accuracy of final results.},
keywords={},
doi={10.1587/transinf.2023PAP0008},
ISSN={1745-1361},
month={December},}
Copier
TY - JOUR
TI - Adaptive Lossy Data Compression Extended Architecture for Memory Bandwidth Conservation in SpMV
T2 - IEICE TRANSACTIONS on Information
SP - 2015
EP - 2025
AU - Siyi HU
AU - Makiko ITO
AU - Takahide YOSHIKAWA
AU - Yuan HE
AU - Hiroshi NAKAMURA
AU - Masaaki KONDO
PY - 2023
DO - 10.1587/transinf.2023PAP0008
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E106-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2023
AB - Widely adopted by machine learning and graph processing applications nowadays, sparse matrix-Vector multiplication (SpMV) is a very popular algorithm in linear algebra. This is especially the case for fully-connected MLP layers, which dominate many SpMV computations and play a substantial role in diverse services. As a consequence, a large fraction of data center cycles is spent on SpMV kernels. Meanwhile, despite having efficient storage options against sparsity (such as CSR or CSC), SpMV kernels still suffer from the problem of limited memory bandwidth during data transferring because of the memory hierarchy of modern computing systems. In more detail, we find that both integer and floating-point data used in SpMV kernels are handled plainly without any necessary pre-processing. Therefore, we believe bandwidth conservation techniques, such as data compression, may dramatically help SpMV kernels when data is transferred between the main memory and the Last Level Cache (LLC). Furthermore, we also observe that convergence conditions in some typical scientific computation benchmarks (based on SpMV kernels) will not be degraded when adopting lower precision floating-point data. Based on these findings, in this work, we propose a simple yet effective data compression scheme that can be extended to general purpose computing architectures or HPC systems preferably. When it is adopted, a best-case speedup of 1.92x is made. Besides, evaluations with both the CG kernel and the PageRank algorithm indicate that our proposal introduces negligible overhead on both the convergence speed and the accuracy of final results.
ER -