The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
L'estimation de pose est un point chaud de la recherche dans les tâches de vision par ordinateur et la clé de la perception informatique des activités humaines. Le concept central de l’estimation de la pose humaine consiste à décrire le mouvement du corps humain à travers les principaux points articulaires. De grands champs récepteurs et des informations spatiales riches facilitent la tâche de localisation des points clés, et la manière de capturer des entités à plus grande échelle et de les réintégrer dans l'espace des entités constitue un défi pour l'estimation de la pose. Pour résoudre ce problème, nous proposons un réseau de convergence multi-échelle (MSCNet) avec un large champ de réception et des informations spatiales riches. La structure du MSCNet est basée sur un réseau de sabliers qui capture des informations à différentes échelles pour présenter une compréhension cohérente du corps dans son ensemble. Les unités de champ récepteur multi-échelle (MSRF) fournissent un grand champ récepteur pour obtenir des informations contextuelles riches, qui sont ensuite sélectivement améliorées ou supprimées par le mécanisme d'attention Squeeze-Excitation (SE) pour effectuer de manière flexible la tâche d'estimation de pose. Les résultats expérimentaux montrent que MSCNet obtient un AP de 73.1 % sur l'ensemble de données COCO, soit une amélioration de 8.8 % par rapport à la méthode traditionnelle CMUPose. Comparé au CPN avancé, le MSCNet possède 68.2 % de complexité de calcul et seulement 55.4 % du nombre de paramètres.
Wenkai LIU
North China University of Technology
Cuizhu QIN
North China University of Technology
Menglong WU
North China University of Technology
Wenle BAI
North China University of Technology
Hongxia DONG
North China University of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Wenkai LIU, Cuizhu QIN, Menglong WU, Wenle BAI, Hongxia DONG, "Selective Learning of Human Pose Estimation Based on Multi-Scale Convergence Network" in IEICE TRANSACTIONS on Information,
vol. E106-D, no. 5, pp. 1081-1084, May 2023, doi: 10.1587/transinf.2022EDL8093.
Abstract: Pose estimation is a research hot spot in computer vision tasks and the key to computer perception of human activities. The core concept of human pose estimation involves describing the motion of the human body through major joint points. Large receptive fields and rich spatial information facilitate the keypoint localization task, and how to capture features on a larger scale and reintegrate them into the feature space is a challenge for pose estimation. To address this problem, we propose a multi-scale convergence network (MSCNet) with a large receptive field and rich spatial information. The structure of the MSCNet is based on an hourglass network that captures information at different scales to present a consistent understanding of the whole body. The multi-scale receptive field (MSRF) units provide a large receptive field to obtain rich contextual information, which is then selectively enhanced or suppressed by the Squeeze-Excitation (SE) attention mechanism to flexibly perform the pose estimation task. Experimental results show that MSCNet scores 73.1% AP on the COCO dataset, an 8.8% improvement compared to the mainstream CMUPose method. Compared to the advanced CPN, the MSCNet has 68.2% of the computational complexity and only 55.4% of the number of parameters.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022EDL8093/_p
Copier
@ARTICLE{e106-d_5_1081,
author={Wenkai LIU, Cuizhu QIN, Menglong WU, Wenle BAI, Hongxia DONG, },
journal={IEICE TRANSACTIONS on Information},
title={Selective Learning of Human Pose Estimation Based on Multi-Scale Convergence Network},
year={2023},
volume={E106-D},
number={5},
pages={1081-1084},
abstract={Pose estimation is a research hot spot in computer vision tasks and the key to computer perception of human activities. The core concept of human pose estimation involves describing the motion of the human body through major joint points. Large receptive fields and rich spatial information facilitate the keypoint localization task, and how to capture features on a larger scale and reintegrate them into the feature space is a challenge for pose estimation. To address this problem, we propose a multi-scale convergence network (MSCNet) with a large receptive field and rich spatial information. The structure of the MSCNet is based on an hourglass network that captures information at different scales to present a consistent understanding of the whole body. The multi-scale receptive field (MSRF) units provide a large receptive field to obtain rich contextual information, which is then selectively enhanced or suppressed by the Squeeze-Excitation (SE) attention mechanism to flexibly perform the pose estimation task. Experimental results show that MSCNet scores 73.1% AP on the COCO dataset, an 8.8% improvement compared to the mainstream CMUPose method. Compared to the advanced CPN, the MSCNet has 68.2% of the computational complexity and only 55.4% of the number of parameters.},
keywords={},
doi={10.1587/transinf.2022EDL8093},
ISSN={1745-1361},
month={May},}
Copier
TY - JOUR
TI - Selective Learning of Human Pose Estimation Based on Multi-Scale Convergence Network
T2 - IEICE TRANSACTIONS on Information
SP - 1081
EP - 1084
AU - Wenkai LIU
AU - Cuizhu QIN
AU - Menglong WU
AU - Wenle BAI
AU - Hongxia DONG
PY - 2023
DO - 10.1587/transinf.2022EDL8093
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E106-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2023
AB - Pose estimation is a research hot spot in computer vision tasks and the key to computer perception of human activities. The core concept of human pose estimation involves describing the motion of the human body through major joint points. Large receptive fields and rich spatial information facilitate the keypoint localization task, and how to capture features on a larger scale and reintegrate them into the feature space is a challenge for pose estimation. To address this problem, we propose a multi-scale convergence network (MSCNet) with a large receptive field and rich spatial information. The structure of the MSCNet is based on an hourglass network that captures information at different scales to present a consistent understanding of the whole body. The multi-scale receptive field (MSRF) units provide a large receptive field to obtain rich contextual information, which is then selectively enhanced or suppressed by the Squeeze-Excitation (SE) attention mechanism to flexibly perform the pose estimation task. Experimental results show that MSCNet scores 73.1% AP on the COCO dataset, an 8.8% improvement compared to the mainstream CMUPose method. Compared to the advanced CPN, the MSCNet has 68.2% of the computational complexity and only 55.4% of the number of parameters.
ER -