The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Cet article présente une méthode rapide capable d'accélérer l'algorithme de Smith-Waterman pour la recherche de bases de données biologiques sur un cluster d'unités de traitement graphique (GPU). Notre méthode est implémentée à l’aide de l’architecture de périphérique unifiée de calcul (CUDA), disponible sur le GPU nVIDIA. Par rapport aux méthodes précédentes, notre méthode apporte quatre contributions majeures. (1) Le procédé utilise efficacement la mémoire partagée sur puce pour réduire la quantité de données transférées entre la mémoire vidéo hors puce et les éléments de traitement du GPU. (2) Il réduit également le nombre de récupérations de données en appliquant une technique de réutilisation des données aux séquences de requêtes et de bases de données. (3) Une méthode pipeline est également implémentée pour chevaucher l'exécution du GPU avec l'accès à la base de données. (4) Enfin, un paradigme maître/travailleur est utilisé pour accélérer des centaines de recherches dans des bases de données sur un système en cluster. Lors d'expériences, les performances maximales sur une carte GeForce GTX 280 atteignent 8.32 mises à jour de cellules giga par seconde (GCUPS). Nous constatons également que notre méthode réduit la quantité de données récupérées à 1/140, atteignant des performances environ trois fois supérieures à celles d'une méthode précédente basée sur CUDA. Notre version de cluster à 32 nœuds est environ 28 fois plus rapide qu'une version à GPU unique. De plus, les performances effectives atteignent 75.6 giga instructions par seconde (GIPS) en utilisant 32 cartes GeForce 8800 GTX.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Yuma MUNEKAWA, Fumihiko INO, Kenichi HAGIHARA, "Accelerating Smith-Waterman Algorithm for Biological Database Search on CUDA-Compatible GPUs" in IEICE TRANSACTIONS on Information,
vol. E93-D, no. 6, pp. 1479-1488, June 2010, doi: 10.1587/transinf.E93.D.1479.
Abstract: This paper presents a fast method capable of accelerating the Smith-Waterman algorithm for biological database search on a cluster of graphics processing units (GPUs). Our method is implemented using compute unified device architecture (CUDA), which is available on the nVIDIA GPU. As compared with previous methods, our method has four major contributions. (1) The method efficiently uses on-chip shared memory to reduce the data amount being transferred between off-chip video memory and processing elements in the GPU. (2) It also reduces the number of data fetches by applying a data reuse technique to query and database sequences. (3) A pipelined method is also implemented to overlap GPU execution with database access. (4) Finally, a master/worker paradigm is employed to accelerate hundreds of database searches on a cluster system. In experiments, the peak performance on a GeForce GTX 280 card reaches 8.32 giga cell updates per second (GCUPS). We also find that our method reduces the amount of data fetches to 1/140, achieving approximately three times higher performance than a previous CUDA-based method. Our 32-node cluster version is approximately 28 times faster than a single GPU version. Furthermore, the effective performance reaches 75.6 giga instructions per second (GIPS) using 32 GeForce 8800 GTX cards.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E93.D.1479/_p
Copier
@ARTICLE{e93-d_6_1479,
author={Yuma MUNEKAWA, Fumihiko INO, Kenichi HAGIHARA, },
journal={IEICE TRANSACTIONS on Information},
title={Accelerating Smith-Waterman Algorithm for Biological Database Search on CUDA-Compatible GPUs},
year={2010},
volume={E93-D},
number={6},
pages={1479-1488},
abstract={This paper presents a fast method capable of accelerating the Smith-Waterman algorithm for biological database search on a cluster of graphics processing units (GPUs). Our method is implemented using compute unified device architecture (CUDA), which is available on the nVIDIA GPU. As compared with previous methods, our method has four major contributions. (1) The method efficiently uses on-chip shared memory to reduce the data amount being transferred between off-chip video memory and processing elements in the GPU. (2) It also reduces the number of data fetches by applying a data reuse technique to query and database sequences. (3) A pipelined method is also implemented to overlap GPU execution with database access. (4) Finally, a master/worker paradigm is employed to accelerate hundreds of database searches on a cluster system. In experiments, the peak performance on a GeForce GTX 280 card reaches 8.32 giga cell updates per second (GCUPS). We also find that our method reduces the amount of data fetches to 1/140, achieving approximately three times higher performance than a previous CUDA-based method. Our 32-node cluster version is approximately 28 times faster than a single GPU version. Furthermore, the effective performance reaches 75.6 giga instructions per second (GIPS) using 32 GeForce 8800 GTX cards.},
keywords={},
doi={10.1587/transinf.E93.D.1479},
ISSN={1745-1361},
month={June},}
Copier
TY - JOUR
TI - Accelerating Smith-Waterman Algorithm for Biological Database Search on CUDA-Compatible GPUs
T2 - IEICE TRANSACTIONS on Information
SP - 1479
EP - 1488
AU - Yuma MUNEKAWA
AU - Fumihiko INO
AU - Kenichi HAGIHARA
PY - 2010
DO - 10.1587/transinf.E93.D.1479
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E93-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 2010
AB - This paper presents a fast method capable of accelerating the Smith-Waterman algorithm for biological database search on a cluster of graphics processing units (GPUs). Our method is implemented using compute unified device architecture (CUDA), which is available on the nVIDIA GPU. As compared with previous methods, our method has four major contributions. (1) The method efficiently uses on-chip shared memory to reduce the data amount being transferred between off-chip video memory and processing elements in the GPU. (2) It also reduces the number of data fetches by applying a data reuse technique to query and database sequences. (3) A pipelined method is also implemented to overlap GPU execution with database access. (4) Finally, a master/worker paradigm is employed to accelerate hundreds of database searches on a cluster system. In experiments, the peak performance on a GeForce GTX 280 card reaches 8.32 giga cell updates per second (GCUPS). We also find that our method reduces the amount of data fetches to 1/140, achieving approximately three times higher performance than a previous CUDA-based method. Our 32-node cluster version is approximately 28 times faster than a single GPU version. Furthermore, the effective performance reaches 75.6 giga instructions per second (GIPS) using 32 GeForce 8800 GTX cards.
ER -