The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Alors que l’efficacité énergétique est devenue une contrainte ou un objectif de conception majeur, les architectures multicœurs hétérogènes sont devenues des plates-formes cibles principales, non seulement dans les systèmes de serveurs mais également dans les systèmes embarqués. Les accélérateurs manycore tels que les GPU deviennent également populaires dans les domaines embarqués, ainsi que dans les cœurs de processeur hétérogènes. Cependant, comme le nombre de cœurs d'un GPU intégré est bien inférieur à celui d'un GPU de serveur, il est important d'utiliser à la fois des CPU multicœurs hétérogènes et des GPU pour atteindre le débit souhaité avec une consommation d'énergie minimale. Dans cet article, nous présentons une étude de cas de cartographie de la détection de visage basée sur LBP sur une plate-forme intégrée hétérogène CPU-GPU récente, qui exploite à la fois le parallélisme des tâches et le parallélisme des données pour atteindre une efficacité énergétique maximale avec une contrainte de temps réel. Nous présentons d'abord la technique de parallélisation de chaque tâche pour l'exécution du GPU, puis nous proposons des modèles de performances et d'énergie pour les exécutions parallèles de tâches et de données sur des processeurs hétérogènes, qui sont utilisés dans l'exploration de l'espace de conception pour le mappage optimal. L'espace de conception est énorme puisque non seulement l'hétérogénéité des processeurs tels que CPU-GPU et big.LITTLE, mais également divers taux de partitionnement des données pour l'exécution parallèle des données sur ces processeurs hétérogènes sont pris en compte. Dans notre étude de cas de détection de visage LBP sur Exynos 5422, l'erreur d'estimation des modèles de performance et d'énergie proposés était en moyenne de -2.19 % et -3.67 % respectivement. En trouvant systématiquement les cartographies optimales avec les modèles proposés, nous pourrions atteindre 28.6% de consommation d'énergie en moins par rapport à la cartographie manuelle, tout en respectant la contrainte temps réel.
Chanyoung OH
University of Seoul
Saehanseul YI
University of Seoul
Youngmin YI
University of Seoul
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Chanyoung OH, Saehanseul YI, Youngmin YI, "Real-Time and Energy-Efficient Face Detection on CPU-GPU Heterogeneous Embedded Platforms" in IEICE TRANSACTIONS on Information,
vol. E101-D, no. 12, pp. 2878-2888, December 2018, doi: 10.1587/transinf.2018PAP0004.
Abstract: As energy efficiency has become a major design constraint or objective, heterogeneous manycore architectures have emerged as mainstream target platforms not only in server systems but also in embedded systems. Manycore accelerators such as GPUs are getting also popular in embedded domains, as well as the heterogeneous CPU cores. However, as the number of cores in an embedded GPU is far less than that of a server GPU, it is important to utilize both heterogeneous multi-core CPUs and GPUs to achieve the desired throughput with the minimal energy consumption. In this paper, we present a case study of mapping LBP-based face detection onto a recent CPU-GPU heterogeneous embedded platform, which exploits both task parallelism and data parallelism to achieve maximal energy efficiency with a real-time constraint. We first present the parallelization technique of each task for the GPU execution, then we propose performance and energy models for both task-parallel and data-parallel executions on heterogeneous processors, which are used in design space exploration for the optimal mapping. The design space is huge since not only processor heterogeneity such as CPU-GPU and big.LITTLE, but also various data partitioning ratios for the data-parallel execution on these heterogeneous processors are considered. In our case study of LBP face detection on Exynos 5422, the estimation error of the proposed performance and energy models were on average -2.19% and -3.67% respectively. By systematically finding the optimal mappings with the proposed models, we could achieve 28.6% less energy consumption compared to the manual mapping, while still meeting the real-time constraint.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018PAP0004/_p
Copier
@ARTICLE{e101-d_12_2878,
author={Chanyoung OH, Saehanseul YI, Youngmin YI, },
journal={IEICE TRANSACTIONS on Information},
title={Real-Time and Energy-Efficient Face Detection on CPU-GPU Heterogeneous Embedded Platforms},
year={2018},
volume={E101-D},
number={12},
pages={2878-2888},
abstract={As energy efficiency has become a major design constraint or objective, heterogeneous manycore architectures have emerged as mainstream target platforms not only in server systems but also in embedded systems. Manycore accelerators such as GPUs are getting also popular in embedded domains, as well as the heterogeneous CPU cores. However, as the number of cores in an embedded GPU is far less than that of a server GPU, it is important to utilize both heterogeneous multi-core CPUs and GPUs to achieve the desired throughput with the minimal energy consumption. In this paper, we present a case study of mapping LBP-based face detection onto a recent CPU-GPU heterogeneous embedded platform, which exploits both task parallelism and data parallelism to achieve maximal energy efficiency with a real-time constraint. We first present the parallelization technique of each task for the GPU execution, then we propose performance and energy models for both task-parallel and data-parallel executions on heterogeneous processors, which are used in design space exploration for the optimal mapping. The design space is huge since not only processor heterogeneity such as CPU-GPU and big.LITTLE, but also various data partitioning ratios for the data-parallel execution on these heterogeneous processors are considered. In our case study of LBP face detection on Exynos 5422, the estimation error of the proposed performance and energy models were on average -2.19% and -3.67% respectively. By systematically finding the optimal mappings with the proposed models, we could achieve 28.6% less energy consumption compared to the manual mapping, while still meeting the real-time constraint.},
keywords={},
doi={10.1587/transinf.2018PAP0004},
ISSN={1745-1361},
month={December},}
Copier
TY - JOUR
TI - Real-Time and Energy-Efficient Face Detection on CPU-GPU Heterogeneous Embedded Platforms
T2 - IEICE TRANSACTIONS on Information
SP - 2878
EP - 2888
AU - Chanyoung OH
AU - Saehanseul YI
AU - Youngmin YI
PY - 2018
DO - 10.1587/transinf.2018PAP0004
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E101-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2018
AB - As energy efficiency has become a major design constraint or objective, heterogeneous manycore architectures have emerged as mainstream target platforms not only in server systems but also in embedded systems. Manycore accelerators such as GPUs are getting also popular in embedded domains, as well as the heterogeneous CPU cores. However, as the number of cores in an embedded GPU is far less than that of a server GPU, it is important to utilize both heterogeneous multi-core CPUs and GPUs to achieve the desired throughput with the minimal energy consumption. In this paper, we present a case study of mapping LBP-based face detection onto a recent CPU-GPU heterogeneous embedded platform, which exploits both task parallelism and data parallelism to achieve maximal energy efficiency with a real-time constraint. We first present the parallelization technique of each task for the GPU execution, then we propose performance and energy models for both task-parallel and data-parallel executions on heterogeneous processors, which are used in design space exploration for the optimal mapping. The design space is huge since not only processor heterogeneity such as CPU-GPU and big.LITTLE, but also various data partitioning ratios for the data-parallel execution on these heterogeneous processors are considered. In our case study of LBP face detection on Exynos 5422, the estimation error of the proposed performance and energy models were on average -2.19% and -3.67% respectively. By systematically finding the optimal mappings with the proposed models, we could achieve 28.6% less energy consumption compared to the manual mapping, while still meeting the real-time constraint.
ER -