The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
La réidentification de personne par vidéo (re-ID) vise à récupérer une personne à travers une caméra sans chevauchement et a obtenu des résultats prometteurs grâce au réseau neuronal convolutif profond. En raison des propriétés dynamiques de la vidéo, les problèmes d'encombrement d'arrière-plan et d'occlusion sont plus graves que la Re-ID de personne basée sur l'image. Dans cette lettre, nous présentons un nouveau réseau de triple attention (TriANet) qui utilise simultanément des informations temporelles, spatiales et contextuelles de canal en employant le mécanisme d'auto-attention pour obtenir des fonctionnalités robustes et discriminantes. Plus précisément, le réseau comprend deux parties, la première partie introduisant un sous-réseau d'attention résiduelle, qui contient un module d'attention de canal pour capturer les dépendances interdimensionnelles en utilisant la rotation et la transformation et un module d'attention spatiale pour se concentrer sur les caractéristiques des piétons. Dans la deuxième partie, un module d'attention temporelle est conçu pour juger du score de qualité de chaque piéton, et pour réduire le poids de l'image incomplète du piéton afin d'atténuer le problème d'occlusion. Nous évaluons notre architecture proposée sur trois ensembles de données, iLIDS-VID, PRID2011 et MARS. De nombreux résultats expérimentaux comparatifs montrent que notre méthode proposée permet d'obtenir des résultats de pointe.
Rui SUN
Hefei University of Technology
Qili LIANG
Hefei University of Technology
Zi YANG
Hefei University of Technology
Zhenghui ZHAO
Hefei University of Technology
Xudong ZHANG
Hefei University of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copier
Rui SUN, Qili LIANG, Zi YANG, Zhenghui ZHAO, Xudong ZHANG, "Triplet Attention Network for Video-Based Person Re-Identification" in IEICE TRANSACTIONS on Information,
vol. E104-D, no. 10, pp. 1775-1779, October 2021, doi: 10.1587/transinf.2021EDL8037.
Abstract: Video-based person re-identification (re-ID) aims at retrieving person across non-overlapping camera and has achieved promising results owing to deep convolutional neural network. Due to the dynamic properties of the video, the problems of background clutters and occlusion are more serious than image-based person Re-ID. In this letter, we present a novel triple attention network (TriANet) that simultaneously utilizes temporal, spatial, and channel context information by employing the self-attention mechanism to get robust and discriminative feature. Specifically, the network has two parts, where the first part introduces a residual attention subnetwork, which contains channel attention module to capture cross-dimension dependencies by using rotation and transformation and spatial attention module to focus on pedestrian feature. In the second part, a time attention module is designed to judge the quality score of each pedestrian, and to reduce the weight of the incomplete pedestrian image to alleviate the occlusion problem. We evaluate our proposed architecture on three datasets, iLIDS-VID, PRID2011 and MARS. Extensive comparative experimental results show that our proposed method achieves state-of-the-art results.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2021EDL8037/_p
Copier
@ARTICLE{e104-d_10_1775,
author={Rui SUN, Qili LIANG, Zi YANG, Zhenghui ZHAO, Xudong ZHANG, },
journal={IEICE TRANSACTIONS on Information},
title={Triplet Attention Network for Video-Based Person Re-Identification},
year={2021},
volume={E104-D},
number={10},
pages={1775-1779},
abstract={Video-based person re-identification (re-ID) aims at retrieving person across non-overlapping camera and has achieved promising results owing to deep convolutional neural network. Due to the dynamic properties of the video, the problems of background clutters and occlusion are more serious than image-based person Re-ID. In this letter, we present a novel triple attention network (TriANet) that simultaneously utilizes temporal, spatial, and channel context information by employing the self-attention mechanism to get robust and discriminative feature. Specifically, the network has two parts, where the first part introduces a residual attention subnetwork, which contains channel attention module to capture cross-dimension dependencies by using rotation and transformation and spatial attention module to focus on pedestrian feature. In the second part, a time attention module is designed to judge the quality score of each pedestrian, and to reduce the weight of the incomplete pedestrian image to alleviate the occlusion problem. We evaluate our proposed architecture on three datasets, iLIDS-VID, PRID2011 and MARS. Extensive comparative experimental results show that our proposed method achieves state-of-the-art results.},
keywords={},
doi={10.1587/transinf.2021EDL8037},
ISSN={1745-1361},
month={October},}
Copier
TY - JOUR
TI - Triplet Attention Network for Video-Based Person Re-Identification
T2 - IEICE TRANSACTIONS on Information
SP - 1775
EP - 1779
AU - Rui SUN
AU - Qili LIANG
AU - Zi YANG
AU - Zhenghui ZHAO
AU - Xudong ZHANG
PY - 2021
DO - 10.1587/transinf.2021EDL8037
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2021
AB - Video-based person re-identification (re-ID) aims at retrieving person across non-overlapping camera and has achieved promising results owing to deep convolutional neural network. Due to the dynamic properties of the video, the problems of background clutters and occlusion are more serious than image-based person Re-ID. In this letter, we present a novel triple attention network (TriANet) that simultaneously utilizes temporal, spatial, and channel context information by employing the self-attention mechanism to get robust and discriminative feature. Specifically, the network has two parts, where the first part introduces a residual attention subnetwork, which contains channel attention module to capture cross-dimension dependencies by using rotation and transformation and spatial attention module to focus on pedestrian feature. In the second part, a time attention module is designed to judge the quality score of each pedestrian, and to reduce the weight of the incomplete pedestrian image to alleviate the occlusion problem. We evaluate our proposed architecture on three datasets, iLIDS-VID, PRID2011 and MARS. Extensive comparative experimental results show that our proposed method achieves state-of-the-art results.
ER -