awesome attention mechanism in cv

Awesome List of Attention Modules and Plug&Play Modules in Computer Vision

1204

170

Python

Awesome-Attention-Mechanism-in-cv

Introduction
Attention Mechanism
Plug and Play Module
Vision Transformer
Contributing

Introduction

This is a list of awesome attention mechanisms used in computer vision, as well as a collection of plug and play modules. Due to limited ability and energy, many modules may not be included. If you have any suggestions or improvements, welcome to submit an issue or PR.

Attention Mechanism

Paper	Publish	Link	Blog
Squeeze and Excitation Network	CVPR18	SENet	zhihu
Global Second-order Pooling Convolutional Networks	CVPR19	GSoPNet
Neural Architecture Search for Lightweight Non-Local Networks	CVPR20	AutoNL
Selective Kernel Network	CVPR19	SKNet	zhihu
Convolutional Block Attention Module	ECCV18	CBAM	zhihu
BottleNeck Attention Module	BMVC18	BAM	zhihu
Concurrent Spatial and Channel ‘Squeeze & Excitation’ in Fully Convolutional Networks	MICCAI18	scSE	zhihu
Non-local Neural Networks	CVPR19	Non-Local(NL)	zhihu
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond	ICCVW19	GCNet	zhihu
CCNet: Criss-Cross Attention for Semantic Segmentation	ICCV19	CCNet
SA-Net:shuffle attention for deep convolutional neural networks	ICASSP 21	SANet	zhihu
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks	CVPR20	ECANet
Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks	CoRR19	SGENet
FcaNet: Frequency Channel Attention Networks	ICCV21	FcaNet
$A^2\text{-}Nets$: Double Attention Networks	NeurIPS18	DANet
Asymmetric Non-local Neural Networks for Semantic Segmentation	ICCV19	APNB
Efficient Attention: Attention with Linear Complexities	CoRR18	EfficientAttention
Image Restoration via Residual Non-local Attention Networks	ICLR19	RNAN
Exploring Self-attention for Image Recognition	CVPR20	SAN
An Empirical Study of Spatial Attention Mechanisms in Deep Networks	ICCV19	None
Object-Contextual Representations for Semantic Segmentation	ECCV20	OCRNet
IAUnet: Global text-Aware Feature Learning for Person Re-Identification	TTNNLS20	IAUNet
ResNeSt: Split-Attention Networks	CoRR20	ResNeSt
Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks	NeurIPS18	GENet
Improving Convolutional Networks with Self-calibrated Convolutions	CVPR20	SCNet
Rotate to Attend: Convolutional Triplet Attention Module	WACV21	TripletAttention
Dual Attention Network for Scene Segmentation	CVPR19	DANet
Relation-Aware Global Attention for Person Re-identification	CVPR20	RGANet
Attentional Feature Fusion	WACV21	AFF
An Attentive Survey of Attention Models	CoRR19	None
Stand-Alone Self-Attention in Vision Models	NeurIPS19	FullAttention
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation	ECCV18	BiSeNet	zhihu
DCANet: Learning Connected Attentions for Convolutional Neural Networks	CoRR20	DCANet
An Empirical Study of Spatial Attention Mechanisms in Deep Networks	ICCV19	None
Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition	CVPR17 Oral	RA-CNN
Guided Attention Network for Object Detection and Counting on Drones	ACM MM20	GANet
Attention Augmented Convolutional Networks	ICCV19	AANet
GLOBAL SELF-ATTENTION NETWORKS FOR IMAGE RECOGNITION	ICLR21	GSA
Attention-Guided Hierarchical Structure Aggregation for Image Matting	CVPR20	HAttMatting
Weight Excitation: Built-in Attention Mechanisms in Convolutional Neural Networks	ECCV20	None
Expectation-Maximization Attention Networks for Semantic Segmentation	ICCV19 Oral	EMANet
Dense-and-implicit attention network	AAAI 20	DIANet
Coordinate Attention for Efficient Mobile Network Design	CVPR21	CoordAttention
Cross-channel Communication Networks	NeurlPS19	C3Net
Gated Convolutional Networks with Hybrid Connectivity for Image Classification	AAAI20	HCGNet
Weighted Channel Dropout for Regularization of Deep Convolutional Neural Network	AAAI19	None
BA^2M: A Batch Aware Attention Module for Image Classification	CVPR21	None
EPSANet：An Efficient Pyramid Split Attention Block on Convolutional Neural Network	CoRR21	EPSANet
Stand-Alone Self-Attention in Vision Models	NeurlPS19	SASA
ResT: An Efficient Transformer for Visual Recognition	CoRR21	ResT
Spanet: Spatial Pyramid Attention Network for Enhanced Image Recognition	ICME20	SPANet
Space-time Mixing Attention for Video Transformer	CoRR21	None
DMSANet: Dual Multi Scale Attention Network	CoRR21	None
CompConv: A Compact Convolution Module for Efficient Feature Learning	CoRR21	None
VOLO: Vision Outlooker for Visual Recognition	CoRR21	VOLO
Interflow: Aggregating Multi-layer Featrue Mappings with Attention Mechanism	CoRR21	None
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning	CoRR21	None
Polarized Self-Attention: Towards High-quality Pixel-wise Regression	CoRR21	PSA
CA-Net: Comprehensive Attention Convolutional Neural Networks for Explainable Medical Image Segmentation	TMI21	CA-Net
BAM: A Lightweight and Efficient Balanced Attention Mechanism for Single Image Super Resolution	CoRR21	BAM
Attention as Activation	CoRR21	ATAC
Region-based Non-local Operation for Video Classification	CoRR21	RNL
MSAF: Multimodal Split Attention Fusion	CoRR21	MSAF
All-Attention Layer	CoRR19	None
Compact Global Descriptor	CoRR20	CGD
SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks	ICML21	SimAM
Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution	ICCV19	OctConv
Contextual Transformer Networks for Visual Recognition	ICCV21	CoTNet
Residual Attention: A Simple but Effective Method for Multi-Label Recognition	ICCV21	CSRA
Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation	CVPR20	SEAM
An Attention Module for Convolutional Neural Networks	ICCV2021	AW-Conv
Attentive Normalization	Arxiv2020	None
Person Re-identification via Attention Pyramid	TIP21	APNet
Unifying Nonlocal Blocks for Neural Networks	ICCV21	SNL
Tiled Squeeze-and-Excite: Channel Attention With Local Spatial Context	ICCVW21	None
PP-NAS: Searching for Plug-and-Play Blocks on Convolutional Neural Network	ICCVW21	PP-NAS
Distilling Knowledge via Knowledge Review	CVPR21	ReviewKD
Dynamic Region-Aware Convolution	CVPR21	None
Encoder Fusion Network With Co-Attention Embedding for Referring Image Segmentation	CVPR21	None
Introvert: Human Trajectory Prediction via Conditional 3D Attention	CVPR21	None
SSAN: Separable Self-Attention Network for Video Representation Learning	CVPR21	None
Delving Deep into Many-to-many Attention for Few-shot Video Object Segmentation	CVPR21	DANet
A2 -FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation	CVPR21	None
Image Super-Resolution with Non-Local Sparse Attention	CVPR21	None
Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection	CVPR21	LaneATT
NAM: Normalization-based Attention Module	CoRR21	NAM
NAS-SCAM: Neural Architecture Search-Based Spatial and Channel Joint Attention Module for Nuclei Semantic Segmentation and Classification	MICCAI20	NAS-SCAM
NASABN: A Neural Architecture Search Framework for Attention-Based Networks	IJCNN20	None
Att-DARTS: Differentiable Neural Architecture Search for Attention	IJCNN20	Att-Darts
On the Integration of Self-Attention and Convolution	CoRR21	ACMix
BoxeR: Box-Attention for 2D and 3D Transformers	CoRR21	None
CoAtNet: Marrying Convolution and Attention for All Data Sizes	NeurlPS21	coatnet
Pay Attention to MLPs	NeurlPS21	gmlp
IC-Conv: Inception Convolution With Efficient Dilation Search	CVPR21 Oral	IC-Conv
SRM : A Style-based Recalibration Module for Convolutional Neural Networks	ICCV19	SRM
SPANet: Spatial Pyramid Attention Network for Enhanced Image Recognition	ICME20	SPANet
Competitive Inner-Imaging Squeeze and Excitation for Residual Network	CoRR18	Competitive-SENet
ULSAM: Ultra-Lightweight Subspace Attention Module for Compact Convolutional Neural Networks	WACV20	ULSAM
Augmenting Convolutional networks with attention-based aggregation	CoRR21	None
Context-aware Attentional Pooling (CAP) for Fine-grained Visual Classification	AAAI21	CAP
Instance Enhancement Batch Normalization: An Adaptive Regulator of Batch Noise	AAAI20	IEBN
ASR: Attention-alike Structural Re-parameterization	CoRR23	None

Dynamic Networks

Title	Publish	Github
Dynamic Neural Networks: A Survey	CoRR21	None
CondConv: Conditionally Parameterized Convolutions for Efficient Inference	NeurlPS19	CondConv
DyNet: Dynamic Convolution for Accelerating Convolutional Neural Networks	CoRR20	None
Dynamic Convolution: Attention over Convolution Kernels	CVPR20	Dynamic-convolution-Pytorch
WeightNet: Revisiting the Design Space of Weight Network	ECCV20	weightNet
Dynamic Filter Networks	NeurlPS20	None
Dynamic deep neural networks: Optimizing accuracy-efficiency trade-offs by selective execution	AAAI17	None
SkipNet: Learning Dynamic Routing in Convolutional Networks	ECCV18	SkipNet
Pay Less Attention with Lightweight and Dynamic Convolutions	ICLR19	fairseq
Unified Dynamic Convolutional Network for Super-Resolution with Variational Degradations	CVPR20	None
Dynamic Group Convolution for Accelerating Convolutional Neural Networks	ECCV20	dgc

Plug and Play Module

Title	Publish	Github
ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks	ICCV19	ACNet
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs	TPAMI18	ASPP
MixConv: Mixed Depthwise Convolutional Kernels	BMCV19	MixedConv
Pyramid Scene Parsing Network	CVPR17	PSP
Receptive Field Block Net for Accurate and Fast Object Detection	ECCV18	RFB
Strip Pooling: Rethinking Spatial Pooling for Scene Parsing	CVPR20	SPNet
SSH: Single Stage Headless Face Detector	ICCV17	SSH
GhostNet: More Features from Cheap Operations	CVPR20	GhostNet
SlimConv: Reducing Channel Redundancy in Convolutional Neural Networks by Weights Flipping	TIP21	SlimConv
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks	ICML19	EfficientNet
CondConv: Conditionally Parameterized Convolutions for Efficient Inference	NeurlPS19	CondConv
PP-NAS: Searching for Plug-and-Play Blocks on Convolutional Neural Network	ICCVW21	PPNAS
Dynamic Convolution: Attention over Convolution Kernels	CVPR20	DynamicConv
PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer	ECCV20	PSConv
DCANet: Dense Context-Aware Network for Semantic Segmentation	ECCV20	DCANet
Enhancing feature fusion for human pose estimation	MVA20	SEB
Object Contextual Representation for sematic segmentation	ECCV2020	HRNet-OCR
DO-Conv: Depthwise Over-parameterized Convolutional Layer	CoRR20	DO-Conv
Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition	CoRR20	PyConv
ULSAM: Ultra-Lightweight Subspace Attention Module for Compact Convolutional Neural Networks	WACV20	ULSAM
Dynamic Group Convolution for Accelerating Convolutional Neural Networks	ECCV20	DGC

Vision Transformer

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR 2021, ViT

[paper] [Github]

Title	Publish	Github
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows	ICCV21	SwinT
CPVT: Conditional Positional Encodings for Vision Transformer	CoRR21	CPVT
GLiT: Neural Architecture Search for Global and Local Image Transformer	CoRR21	GLiT
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases	CoRR21	ConViT
CeiT: Incorporating Convolution Designs into Visual Transformers	CoRR21	CeiT
BoTNet: Bottleneck Transformers for Visual Recognition	CVPR21	BoTNet
CvT: Introducing Convolutions to Vision Transformers	ICCV21	CvT
TransCNN: Transformer in Convolutional Neural Networks	CoRR21	TransCNN
ResT: An Efficient Transformer for Visual Recognition	CoRR21	ResT
CoaT: Co-Scale Conv-Attentional Image Transformers	CoRR21	CoaT
ConTNet: Why not use convolution and transformer at the same time?	CoRR21	ConTNet
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification	NeurlPS21	DynamicViT
DVT: Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition	NeurlPS21	DVT
CoAtNet: Marrying Convolution and Attention for All Data Sizes	CoRR21	CoAtNet
Early Convolutions Help Transformers See Better	CoRR21	None
Compact Transformers: Escaping the Big Data Paradigm with Compact Transformers	CoRR21	CCT
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer	CoRR21	MobileViT
LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference	CoRR21	LeViT
Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer	CoRR21	ShuffleTransformer
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias	CoRR21	ViTAE
LocalViT: Bringing Locality to Vision Transformers	CoRR21	LocalViT
DeiT: Training data-efficient image transformers & distillation through attention	ICML21	DeiT
CaiT: Going deeper with Image Transformers	ICCV21	CaiT
Efﬁcient Training of Visual Transformers with Small-Size Datasets	NeurlPS21	None
Vision Transformer with Deformable Attention	CoRR22	DAT
MaxViT: Multi-Axis Vision Transformer	CoRR22	None
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition	CoRR22	Conv2Former
Rethinking Mobile Block for Efficient Neural Models	CoRR23	EMO
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning	ECCV22	Wave-ViT
Dual Vision Transformer	CoRR23	Dual-ViT
[CoTNet: Contextual transformer networks for visual recognition](Contextual transformer networks for visual recognition)	TPAMI22	CoTNet
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders	CoRR23	ConvNeXt-V2
A Close Look at Spatial Modeling: From Attention to Convolution	CoRR22	FCViT
Scalable Diffusion Models with Transformers	CVPR22	DiT
Dynamic Grained Encoder for Vision Transformers	NeurlPS21	vtpack
Segment Anything	CoRR23	SAM
Improved robustness of vision transformers via prelayernorm in patch embedding	PR23	None

Title	Publish	Github	Main Idea
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows	ICCV21	SwinT
CPVT: Conditional Positional Encodings for Vision Transformer	CoRR21	CPVT
GLiT: Neural Architecture Search for Global and Local Image Transformer	CoRR21	GLiT	NAS
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases	CoRR21	ConViT	GPSA
CeiT: Incorporating Convolution Designs into Visual Transformers	CoRR21	CeiT	LCA,LeFF
BoTNet: Bottleneck Transformers for Visual Recognition	CVPR21	BoTNet	NonBlock-like
CvT: Introducing Convolutions to Vision Transformers	ICCV21	CvT	projection
TransCNN: Transformer in Convolutional Neural Networks	CoRR21	TransCNN
ResT: An Efficient Transformer for Visual Recognition	CoRR21	ResT
CoaT: Co-Scale Conv-Attentional Image Transformers	CoRR21	CoaT
ConTNet: Why not use convolution and transformer at the same time?	CoRR21	ConTNet
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification	NIPS21	DynamicViT
DVT: Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition	NIPS21	DVT
CoAtNet: Marrying Convolution and Attention for All Data Sizes	CoRR21	CoAtNet
Early Convolutions Help Transformers See Better	CoRR21	None
Compact Transformers: Escaping the Big Data Paradigm with Compact Transformers	CoRR21	CCT
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer	CoRR21	MobileViT
LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference	CoRR21	LeViT
Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer	CoRR21	ShuffleTransformer
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias	CoRR21	ViTAE
LocalViT: Bringing Locality to Vision Transformers	CoRR21	LocalViT
DeiT: Training data-efficient image transformers & distillation through attention	ICML21	DeiT
CaiT: Going deeper with Image Transformers	ICCV21	CaiT
Efﬁcient Training of Visual Transformers with Small-Size Datasets	NIPS21	None
Vision Transformer with Deformable Attention	CoRR22	DAT	DeformConv+SA
MaxViT: Multi-Axis Vision Transformer	CoRR22	None	dilated attention
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition	CoRR22	Conv2Former
Demystify Transformers & Convolutions in Modern Image Deep Networks	CoRR22	STM-Evaluation	dai jifeng!

Contributing

If you know of any awesome attention mechanism in computer vision resources, please add them in the PRs or issues.

Additional article papers and corresponding code links are welcome in the issue.

Thanks to @dedekinds for pointing out the problem in the DIANet description.