awesome attention mechanism in cv

Awesome List of Attention Modules and Plug&Play Modules in Computer Vision

1102
162
Python

Awesome-Attention-Mechanism-in-cv Awesome

Table of Contents

Introduction

This is a list of awesome attention mechanisms used in computer vision, as well as a collection of plug and play modules. Due to limited ability and energy, many modules may not be included. If you have any suggestions or improvements, welcome to submit an issue or PR.

Attention Mechanism

Paper Publish Link Blog
Squeeze and Excitation Network CVPR18 SENet zhihu
Global Second-order Pooling Convolutional Networks CVPR19 GSoPNet
Neural Architecture Search for Lightweight Non-Local Networks CVPR20 AutoNL
Selective Kernel Network CVPR19 SKNet zhihu
Convolutional Block Attention Module ECCV18 CBAM zhihu
BottleNeck Attention Module BMVC18 BAM zhihu
Concurrent Spatial and Channel ‘Squeeze & Excitation’ in Fully Convolutional Networks MICCAI18 scSE zhihu
Non-local Neural Networks CVPR19 Non-Local(NL) zhihu
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond ICCVW19 GCNet zhihu
CCNet: Criss-Cross Attention for Semantic Segmentation ICCV19 CCNet
SA-Net:shuffle attention for deep convolutional neural networks ICASSP 21 SANet zhihu
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks CVPR20 ECANet
Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks CoRR19 SGENet
FcaNet: Frequency Channel Attention Networks ICCV21 FcaNet
$A^2\text{-}Nets$: Double Attention Networks NeurIPS18 DANet
Asymmetric Non-local Neural Networks for Semantic Segmentation ICCV19 APNB
Efficient Attention: Attention with Linear Complexities CoRR18 EfficientAttention
Image Restoration via Residual Non-local Attention Networks ICLR19 RNAN
Exploring Self-attention for Image Recognition CVPR20 SAN
An Empirical Study of Spatial Attention Mechanisms in Deep Networks ICCV19 None
Object-Contextual Representations for Semantic Segmentation ECCV20 OCRNet
IAUnet: Global text-Aware Feature Learning for Person Re-Identification TTNNLS20 IAUNet
ResNeSt: Split-Attention Networks CoRR20 ResNeSt
Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks NeurIPS18 GENet
Improving Convolutional Networks with Self-calibrated Convolutions CVPR20 SCNet
Rotate to Attend: Convolutional Triplet Attention Module WACV21 TripletAttention
Dual Attention Network for Scene Segmentation CVPR19 DANet
Relation-Aware Global Attention for Person Re-identification CVPR20 RGANet
Attentional Feature Fusion WACV21 AFF
An Attentive Survey of Attention Models CoRR19 None
Stand-Alone Self-Attention in Vision Models NeurIPS19 FullAttention
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation ECCV18 BiSeNet zhihu
DCANet: Learning Connected Attentions for Convolutional Neural Networks CoRR20 DCANet
An Empirical Study of Spatial Attention Mechanisms in Deep Networks ICCV19 None
Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition CVPR17 Oral RA-CNN
Guided Attention Network for Object Detection and Counting on Drones ACM MM20 GANet
Attention Augmented Convolutional Networks ICCV19 AANet
GLOBAL SELF-ATTENTION NETWORKS FOR IMAGE RECOGNITION ICLR21 GSA
Attention-Guided Hierarchical Structure Aggregation for Image Matting CVPR20 HAttMatting
Weight Excitation: Built-in Attention Mechanisms in Convolutional Neural Networks ECCV20 None
Expectation-Maximization Attention Networks for Semantic Segmentation ICCV19 Oral EMANet
Dense-and-implicit attention network AAAI 20 DIANet
Coordinate Attention for Efficient Mobile Network Design CVPR21 CoordAttention
Cross-channel Communication Networks NeurlPS19 C3Net
Gated Convolutional Networks with Hybrid Connectivity for Image Classification AAAI20 HCGNet
Weighted Channel Dropout for Regularization of Deep Convolutional Neural Network AAAI19 None
BA^2M: A Batch Aware Attention Module for Image Classification CVPR21 None
EPSANet:An Efficient Pyramid Split Attention Block on Convolutional Neural Network CoRR21 EPSANet
Stand-Alone Self-Attention in Vision Models NeurlPS19 SASA
ResT: An Efficient Transformer for Visual Recognition CoRR21 ResT
Spanet: Spatial Pyramid Attention Network for Enhanced Image Recognition ICME20 SPANet
Space-time Mixing Attention for Video Transformer CoRR21 None
DMSANet: Dual Multi Scale Attention Network CoRR21 None
CompConv: A Compact Convolution Module for Efficient Feature Learning CoRR21 None
VOLO: Vision Outlooker for Visual Recognition CoRR21 VOLO
Interflow: Aggregating Multi-layer Featrue Mappings with Attention Mechanism CoRR21 None
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning CoRR21 None
Polarized Self-Attention: Towards High-quality Pixel-wise Regression CoRR21 PSA
CA-Net: Comprehensive Attention Convolutional Neural Networks for Explainable Medical Image Segmentation TMI21 CA-Net
BAM: A Lightweight and Efficient Balanced Attention Mechanism for Single Image Super Resolution CoRR21 BAM
Attention as Activation CoRR21 ATAC
Region-based Non-local Operation for Video Classification CoRR21 RNL
MSAF: Multimodal Split Attention Fusion CoRR21 MSAF
All-Attention Layer CoRR19 None
Compact Global Descriptor CoRR20 CGD
SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks ICML21 SimAM
Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution ICCV19 OctConv
Contextual Transformer Networks for Visual Recognition ICCV21 CoTNet
Residual Attention: A Simple but Effective Method for Multi-Label Recognition ICCV21 CSRA
Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation CVPR20 SEAM
An Attention Module for Convolutional Neural Networks ICCV2021 AW-Conv
Attentive Normalization Arxiv2020 None
Person Re-identification via Attention Pyramid TIP21 APNet
Unifying Nonlocal Blocks for Neural Networks ICCV21 SNL
Tiled Squeeze-and-Excite: Channel Attention With Local Spatial Context ICCVW21 None
PP-NAS: Searching for Plug-and-Play Blocks on Convolutional Neural Network ICCVW21 PP-NAS
Distilling Knowledge via Knowledge Review CVPR21 ReviewKD
Dynamic Region-Aware Convolution CVPR21 None
Encoder Fusion Network With Co-Attention Embedding for Referring Image Segmentation CVPR21 None
Introvert: Human Trajectory Prediction via Conditional 3D Attention CVPR21 None
SSAN: Separable Self-Attention Network for Video Representation Learning CVPR21 None
Delving Deep into Many-to-many Attention for Few-shot Video Object Segmentation CVPR21 DANet
A2 -FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation CVPR21 None
Image Super-Resolution with Non-Local Sparse Attention CVPR21 None
Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection CVPR21 LaneATT
NAM: Normalization-based Attention Module CoRR21 NAM
NAS-SCAM: Neural Architecture Search-Based Spatial and Channel Joint Attention Module for Nuclei Semantic Segmentation and Classification MICCAI20 NAS-SCAM
NASABN: A Neural Architecture Search Framework for Attention-Based Networks IJCNN20 None
Att-DARTS: Differentiable Neural Architecture Search for Attention IJCNN20 Att-Darts
On the Integration of Self-Attention and Convolution CoRR21 ACMix
BoxeR: Box-Attention for 2D and 3D Transformers CoRR21 None
CoAtNet: Marrying Convolution and Attention for All Data Sizes NeurlPS21 coatnet
Pay Attention to MLPs NeurlPS21 gmlp
IC-Conv: Inception Convolution With Efficient Dilation Search CVPR21 Oral IC-Conv
SRM : A Style-based Recalibration Module for Convolutional Neural Networks ICCV19 SRM
SPANet: Spatial Pyramid Attention Network for Enhanced Image Recognition ICME20 SPANet
Competitive Inner-Imaging Squeeze and Excitation for Residual Network CoRR18 Competitive-SENet
ULSAM: Ultra-Lightweight Subspace Attention Module for Compact Convolutional Neural Networks WACV20 ULSAM
Augmenting Convolutional networks with attention-based aggregation CoRR21 None
Context-aware Attentional Pooling (CAP) for Fine-grained Visual Classification AAAI21 CAP
Instance Enhancement Batch Normalization: An Adaptive Regulator of Batch Noise AAAI20 IEBN
ASR: Attention-alike Structural Re-parameterization CoRR23 None

Dynamic Networks

Title Publish Github
Dynamic Neural Networks: A Survey CoRR21 None
CondConv: Conditionally Parameterized Convolutions for Efficient Inference NeurlPS19 CondConv
DyNet: Dynamic Convolution for Accelerating Convolutional Neural Networks CoRR20 None
Dynamic Convolution: Attention over Convolution Kernels CVPR20 Dynamic-convolution-Pytorch
WeightNet: Revisiting the Design Space of Weight Network ECCV20 weightNet
Dynamic Filter Networks NeurlPS20 None
Dynamic deep neural networks: Optimizing accuracy-efficiency trade-offs by selective execution AAAI17 None
SkipNet: Learning Dynamic Routing in Convolutional Networks ECCV18 SkipNet
Pay Less Attention with Lightweight and Dynamic Convolutions ICLR19 fairseq
Unified Dynamic Convolutional Network for Super-Resolution with Variational Degradations CVPR20 None
Dynamic Group Convolution for Accelerating Convolutional Neural Networks ECCV20 dgc

Plug and Play Module

Title Publish Github
ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks ICCV19 ACNet
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs TPAMI18 ASPP
MixConv: Mixed Depthwise Convolutional Kernels BMCV19 MixedConv
Pyramid Scene Parsing Network CVPR17 PSP
Receptive Field Block Net for Accurate and Fast Object Detection ECCV18 RFB
Strip Pooling: Rethinking Spatial Pooling for Scene Parsing CVPR20 SPNet
SSH: Single Stage Headless Face Detector ICCV17 SSH
GhostNet: More Features from Cheap Operations CVPR20 GhostNet
SlimConv: Reducing Channel Redundancy in Convolutional Neural Networks by Weights Flipping TIP21 SlimConv
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks ICML19 EfficientNet
CondConv: Conditionally Parameterized Convolutions for Efficient Inference NeurlPS19 CondConv
PP-NAS: Searching for Plug-and-Play Blocks on Convolutional Neural Network ICCVW21 PPNAS
Dynamic Convolution: Attention over Convolution Kernels CVPR20 DynamicConv
PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer ECCV20 PSConv
DCANet: Dense Context-Aware Network for Semantic Segmentation ECCV20 DCANet
Enhancing feature fusion for human pose estimation MVA20 SEB
Object Contextual Representation for sematic segmentation ECCV2020 HRNet-OCR
DO-Conv: Depthwise Over-parameterized Convolutional Layer CoRR20 DO-Conv
Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition CoRR20 PyConv
ULSAM: Ultra-Lightweight Subspace Attention Module for Compact Convolutional Neural Networks WACV20 ULSAM
Dynamic Group Convolution for Accelerating Convolutional Neural Networks ECCV20 DGC

Vision Transformer

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR 2021, ViT

[paper] [Github]

Title Publish Github
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows ICCV21 SwinT
CPVT: Conditional Positional Encodings for Vision Transformer CoRR21 CPVT
GLiT: Neural Architecture Search for Global and Local Image Transformer CoRR21 GLiT
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases CoRR21 ConViT
CeiT: Incorporating Convolution Designs into Visual Transformers CoRR21 CeiT
BoTNet: Bottleneck Transformers for Visual Recognition CVPR21 BoTNet
CvT: Introducing Convolutions to Vision Transformers ICCV21 CvT
TransCNN: Transformer in Convolutional Neural Networks CoRR21 TransCNN
ResT: An Efficient Transformer for Visual Recognition CoRR21 ResT
CoaT: Co-Scale Conv-Attentional Image Transformers CoRR21 CoaT
ConTNet: Why not use convolution and transformer at the same time? CoRR21 ConTNet
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification NeurlPS21 DynamicViT
DVT: Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition NeurlPS21 DVT
CoAtNet: Marrying Convolution and Attention for All Data Sizes CoRR21 CoAtNet
Early Convolutions Help Transformers See Better CoRR21 None
Compact Transformers: Escaping the Big Data Paradigm with Compact Transformers CoRR21 CCT
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer CoRR21 MobileViT
LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference CoRR21 LeViT
Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer CoRR21 ShuffleTransformer
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias CoRR21 ViTAE
LocalViT: Bringing Locality to Vision Transformers CoRR21 LocalViT
DeiT: Training data-efficient image transformers & distillation through attention ICML21 DeiT
CaiT: Going deeper with Image Transformers ICCV21 CaiT
Efficient Training of Visual Transformers with Small-Size Datasets NeurlPS21 None
Vision Transformer with Deformable Attention CoRR22 DAT
MaxViT: Multi-Axis Vision Transformer CoRR22 None
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition CoRR22 Conv2Former
Rethinking Mobile Block for Efficient Neural Models CoRR23 EMO
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning ECCV22 Wave-ViT
Dual Vision Transformer CoRR23 Dual-ViT
[CoTNet: Contextual transformer networks for visual recognition](Contextual transformer networks for visual recognition) TPAMI22 CoTNet
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders CoRR23 ConvNeXt-V2
A Close Look at Spatial Modeling: From Attention to Convolution CoRR22 FCViT
Scalable Diffusion Models with Transformers CVPR22 DiT
Dynamic Grained Encoder for Vision Transformers NeurlPS21 vtpack
Segment Anything CoRR23 SAM
Improved robustness of vision transformers via prelayernorm in patch embedding PR23 None
Title Publish Github Main Idea
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows ICCV21 SwinT
CPVT: Conditional Positional Encodings for Vision Transformer CoRR21 CPVT
GLiT: Neural Architecture Search for Global and Local Image Transformer CoRR21 GLiT NAS
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases CoRR21 ConViT GPSA
CeiT: Incorporating Convolution Designs into Visual Transformers CoRR21 CeiT LCA,LeFF
BoTNet: Bottleneck Transformers for Visual Recognition CVPR21 BoTNet NonBlock-like
CvT: Introducing Convolutions to Vision Transformers ICCV21 CvT projection
TransCNN: Transformer in Convolutional Neural Networks CoRR21 TransCNN
ResT: An Efficient Transformer for Visual Recognition CoRR21 ResT
CoaT: Co-Scale Conv-Attentional Image Transformers CoRR21 CoaT
ConTNet: Why not use convolution and transformer at the same time? CoRR21 ConTNet
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification NIPS21 DynamicViT
DVT: Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition NIPS21 DVT
CoAtNet: Marrying Convolution and Attention for All Data Sizes CoRR21 CoAtNet
Early Convolutions Help Transformers See Better CoRR21 None
Compact Transformers: Escaping the Big Data Paradigm with Compact Transformers CoRR21 CCT
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer CoRR21 MobileViT
LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference CoRR21 LeViT
Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer CoRR21 ShuffleTransformer
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias CoRR21 ViTAE
LocalViT: Bringing Locality to Vision Transformers CoRR21 LocalViT
DeiT: Training data-efficient image transformers & distillation through attention ICML21 DeiT
CaiT: Going deeper with Image Transformers ICCV21 CaiT
Efficient Training of Visual Transformers with Small-Size Datasets NIPS21 None
Vision Transformer with Deformable Attention CoRR22 DAT DeformConv+SA
MaxViT: Multi-Axis Vision Transformer CoRR22 None dilated attention
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition CoRR22 Conv2Former
Demystify Transformers & Convolutions in Modern Image Deep Networks CoRR22 STM-Evaluation dai jifeng!

Contributing

If you know of any awesome attention mechanism in computer vision resources, please add them in the PRs or issues.

Additional article papers and corresponding code links are welcome in the issue.

Thanks to @dedekinds for pointing out the problem in the DIANet description.