The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
Nous proposons un cadre pour l'intégration de réseaux hétérogènes dans l'estimation de la pose humaine (HPE) dans le but d'équilibrer précision et complexité informatique. Bien que de nombreuses méthodes existantes puissent améliorer la précision de HPE en utilisant plusieurs images dans des vidéos, elles augmentent également la complexité des calculs. La principale différence ici est que le cadre hétérogène proposé comporte différents réseaux pour différents types de trames, alors que les méthodes existantes utilisent les mêmes réseaux pour toutes les trames. En particulier, nous proposons de diviser les images vidéo en deux types, y compris les images clés et les images non clés, et d'adopter trois réseaux : les réseaux lents, les réseaux rapides et les réseaux de transfert dans notre cadre hétérogène. Pour les images clés, un réseau lent est utilisé, doté d'une grande précision mais d'une grande complexité de calcul. Pour les images non clés qui suivent une image clé, nous proposons de déformer la carte thermique d'un réseau lent à partir d'une image clé via un réseau de transfert et de la fusionner avec un réseau rapide ayant une faible précision mais une faible complexité de calcul. De plus, lorsqu'on s'étend à l'utilisation de trames à long terme dans lesquelles un grand nombre de trames non clés suivent une trame clé, la corrélation temporelle diminue. Par conséquent, lorsque cela est nécessaire, nous utilisons un réseau de transfert supplémentaire qui déforme la carte thermique d'une image non clé voisine. Les résultats expérimentaux sur les ensembles de données PoseTrack 2017 et PoseTrack 2018 démontrent que la méthode FSPose proposée atteint un meilleur équilibre entre précision et complexité de calcul que la méthode concurrente. Notre code source est disponible sur https://github.com/Fenax79/fspose.
Jianfeng XU
KDDI Research, Inc.
Satoshi KOMORITA
KDDI Research, Inc.
Kei KAWAMURA
KDDI Research, Inc.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Jianfeng XU, Satoshi KOMORITA, Kei KAWAMURA, "FSPose: A Heterogeneous Framework with Fast and Slow Networks for Human Pose Estimation in Videos" in IEICE TRANSACTIONS on Information,
vol. E106-D, no. 6, pp. 1165-1174, June 2023, doi: 10.1587/transinf.2022EDP7182.
Abstract: We propose a framework for the integration of heterogeneous networks in human pose estimation (HPE) with the aim of balancing accuracy and computational complexity. Although many existing methods can improve the accuracy of HPE using multiple frames in videos, they also increase the computational complexity. The key difference here is that the proposed heterogeneous framework has various networks for different types of frames, while existing methods use the same networks for all frames. In particular, we propose to divide the video frames into two types, including key frames and non-key frames, and adopt three networks including slow networks, fast networks, and transfer networks in our heterogeneous framework. For key frames, a slow network is used that has high accuracy but high computational complexity. For non-key frames that follow a key frame, we propose to warp the heatmap of a slow network from a key frame via a transfer network and fuse it with a fast network that has low accuracy but low computational complexity. Furthermore, when extending to the usage of long-term frames where a large number of non-key frames follow a key frame, the temporal correlation decreases. Therefore, when necessary, we use an additional transfer network that warps the heatmap from a neighboring non-key frame. The experimental results on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed FSPose achieves a better balance between accuracy and computational complexity than the competitor method. Our source code is available at https://github.com/Fenax79/fspose.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022EDP7182/_p
Copier
@ARTICLE{e106-d_6_1165,
author={Jianfeng XU, Satoshi KOMORITA, Kei KAWAMURA, },
journal={IEICE TRANSACTIONS on Information},
title={FSPose: A Heterogeneous Framework with Fast and Slow Networks for Human Pose Estimation in Videos},
year={2023},
volume={E106-D},
number={6},
pages={1165-1174},
abstract={We propose a framework for the integration of heterogeneous networks in human pose estimation (HPE) with the aim of balancing accuracy and computational complexity. Although many existing methods can improve the accuracy of HPE using multiple frames in videos, they also increase the computational complexity. The key difference here is that the proposed heterogeneous framework has various networks for different types of frames, while existing methods use the same networks for all frames. In particular, we propose to divide the video frames into two types, including key frames and non-key frames, and adopt three networks including slow networks, fast networks, and transfer networks in our heterogeneous framework. For key frames, a slow network is used that has high accuracy but high computational complexity. For non-key frames that follow a key frame, we propose to warp the heatmap of a slow network from a key frame via a transfer network and fuse it with a fast network that has low accuracy but low computational complexity. Furthermore, when extending to the usage of long-term frames where a large number of non-key frames follow a key frame, the temporal correlation decreases. Therefore, when necessary, we use an additional transfer network that warps the heatmap from a neighboring non-key frame. The experimental results on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed FSPose achieves a better balance between accuracy and computational complexity than the competitor method. Our source code is available at https://github.com/Fenax79/fspose.},
keywords={},
doi={10.1587/transinf.2022EDP7182},
ISSN={1745-1361},
month={June},}
Copier
TY - JOUR
TI - FSPose: A Heterogeneous Framework with Fast and Slow Networks for Human Pose Estimation in Videos
T2 - IEICE TRANSACTIONS on Information
SP - 1165
EP - 1174
AU - Jianfeng XU
AU - Satoshi KOMORITA
AU - Kei KAWAMURA
PY - 2023
DO - 10.1587/transinf.2022EDP7182
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E106-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 2023
AB - We propose a framework for the integration of heterogeneous networks in human pose estimation (HPE) with the aim of balancing accuracy and computational complexity. Although many existing methods can improve the accuracy of HPE using multiple frames in videos, they also increase the computational complexity. The key difference here is that the proposed heterogeneous framework has various networks for different types of frames, while existing methods use the same networks for all frames. In particular, we propose to divide the video frames into two types, including key frames and non-key frames, and adopt three networks including slow networks, fast networks, and transfer networks in our heterogeneous framework. For key frames, a slow network is used that has high accuracy but high computational complexity. For non-key frames that follow a key frame, we propose to warp the heatmap of a slow network from a key frame via a transfer network and fuse it with a fast network that has low accuracy but low computational complexity. Furthermore, when extending to the usage of long-term frames where a large number of non-key frames follow a key frame, the temporal correlation decreases. Therefore, when necessary, we use an additional transfer network that warps the heatmap from a neighboring non-key frame. The experimental results on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed FSPose achieves a better balance between accuracy and computational complexity than the competitor method. Our source code is available at https://github.com/Fenax79/fspose.
ER -