Need help with Awesome-Visual-Transformer?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

294 Stars 24 Forks 54 Commits 0 Opened issues


Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)

Services available


Need anything else?

Contributors list

No Data

Awesome Visual-Transformer Awesome

Collect some Transformer with Computer-Vision (CV) papers.


Transformer original paper

Technical blog

  • [Chinese Blog] 3W字长文带你轻松入门视觉transformer [Link]


  • Transformers in Vision: A Survey [paper] - 2021.01.04
  • A Survey on Visual Transformer [paper] - 2020.12.24

arXiv papers

  • Training Vision Transformers for Image Retrieval[paper]
  • [TransReID] TransReID: Transformer-based Object Re-Identification[paper]
  • [VTN] Video Transformer Network[paper]
  • [T2T-ViT] Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet [paper] [code]
  • [BoTNet] Bottleneck Transformers for Visual Recognition [paper]
  • [CPTR] CPTR: Full Transformer Network for Image Captioning [paper]
  • Learn to Dance with AIST++: Music Conditioned 3D Dance Generation [paper] [code]
  • [Trans2Seg] Segmenting Transparent Object in the Wild with Transformer [paper] [code]
  • [SMCA] Fast Convergence of DETR with Spatially Modulated Co-Attention [paper]
  • Investigating the Vision Transformer Model for Image Retrieval Tasks [paper]
  • [Trear] Trear: Transformer-based RGB-D Egocentric Action Recognition [paper]
  • [VisTR] End-to-End Video Instance Segmentation with Transformers [paper]
  • [VisualSparta] VisualSparta: Sparse Transformer Fragment-level Matching for Large-scale Text-to-Image Search [paper]
  • [TrackFormer] TrackFormer: Multi-Object Tracking with Transformers [paper]
  • [LETR] Line Segment Detection Using Transformers without Edges [paper]
  • [TAPE] Transformer Guided Geometry Model for Flow-Based Unsupervised Visual Odometry [paper]
  • [TRIQ] Transformer for Image Quality Assessment [paper] [code]
  • [TransTrack] TransTrack: Multiple-Object Tracking with Transformer [paper] [code]
  • [SETR] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [paper] [code]
  • [TransPose] TransPose: Towards Explainable Human Pose Estimation by Transformer [paper]
  • [DeiT] Training data-efficient image transformers & distillation through attention [paper]
  • [Pointformer] 3D Object Detection with Pointformer [paper]
  • [ViT-FRCNN] Toward Transformer-Based Object Detection [paper]
  • [Taming-transformers] Taming Transformers for High-Resolution Image Synthesis [paper] [code]
  • [SceneFormer] SceneFormer: Indoor Scene Generation with Transformers [paper]
  • [PCT] PCT: Point Cloud Transformer [paper]
  • Transformer Interpretability Beyond Attention Visualization[paper] [code]
  • [METRO] End-to-End Human Pose and Mesh Reconstruction with Transformers [paper]
  • [PointTransformer] Point Transformer[paper]
  • [PED] DETR for Pedestrian Detection[paper]
  • [UP-DETR] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers[paper]
  • [C-Tran] General Multi-label Image Classification with Transformers[paper]
  • [TSP-FCOS] Rethinking Transformer-based Set Prediction for Object Detection[paper]
  • [IPT] Pre-Trained Image Processing Transformer[paper]
  • [ACT] End-to-End Object Detection with Adaptive Clustering Transformer[paper]
  • [VTs] Visual Transformers: Token-based Image Representation and Processing for Computer Vision[paper]


  • [Vision Transformer] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale(ICLR)[paper] [code]
  • [Deformable DETR] Deformable DETR: Deformable Transformers for End-to-End Object Detection(ICLR)[paper] [code]
  • [LSTR] End-to-end Lane Shape Prediction with Transformers(WACV) [paper] [code]


  • [DETR] End-to-End Object Detection with Transformers (ECCV) [paper] [code]
  • [FPT] Feature Pyramid Transformer(CVPR) [paper] [code]
  • [TTSR] Learning Texture Transformer Network for Image Super-Resolution(CVPR) [paper] [code]


Thanks the template from Awesome-Crowd-Counting

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.