Hilfe beim Zugang
MTPose: Human Pose Estimation with High-Resolution Multi-scale Transformers
Abstract HRNet (High-Resolution Networks) as reported by Sun et al. (in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2019) has been the state-of-the-art human pose estimation method, benefitting from its parallel high-resolution designed network structur...
Ausführliche Beschreibung
Abstract HRNet (High-Resolution Networks) as reported by Sun et al. (in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2019) has been the state-of-the-art human pose estimation method, benefitting from its parallel high-resolution designed network structures. However, HRNet is still a typical CNN (Convolutional Neural Networks) architecture, with local convolution operations. Recently, Transformers have been successfully applied in many computer vision areas. The main mechanism in Transformers is self-attention, which can learn global or long-range dependencies among different parts. In this paper, we propose a human pose estimation framework built upon High-Resolution Multi-scale Transformers, termed MTPose. We combine the two advantages of high-resolution and Transformers together to improve the performance. Specifically, we design a sub-network, MTNet (Multi-scale Transformers-based high-resolution Networks), which consists of two parallel branches. One is high-resolution with convolutional local operations, named as local branch. The other is the global branch utilizing multi-scale Transformer encoders to learn long-range dependencies of the whole body keypoints. At the end of the networks, the two branches are integrated together to predict the final keypoint heatmaps. Experiments on two benchmark datasets, the MSCOCO keypoint detection dataset and MPII human pose dataset, demonstrate that our method can significantly improve the state-of-the-art human pose estimation methods. Code will be available at: https://github.com/fudiGeng/MTPose. Ausführliche Beschreibung