IVIF_ZOO

Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption. A fire-new survey for infrared and visible image fusion.

RollingPlain

166

Python

Latest News 🔥🔥

[2024-12-12] Our survey paper [Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption.] has been accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence!
(Paper)(中文版)

IVIF Zoo

Welcome to IVIF Zoo, a comprehensive repository dedicated to Infrared and Visible Image Fusion (IVIF). Based on our survey paper [Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption. Jinyuan Liu, Guanyao Wu, Zhu Liu, Di Wang, Zhiying Jiang, Long Ma, Wei Zhong, Xin Fan, Risheng Liu*], this repository aims to serve as a central hub for researchers, engineers, and enthusiasts in the field of IVIF. Here, you’ll find a wide array of resources, tools, and datasets, curated to accelerate advancements and foster collaboration in infrared-visible image fusion technologies.

preview
_{A detailed spectrogram depicting almost all wavelength and frequency ranges, particularly expanding the range of the human visual system and annotating corresponding computer vision and image fusion datasets.}

preview
The diagram of infrared and visible image fusion for practical applications. Existing image fusion methods majorly focus on the design of architectures and training strategies for visual enhancement, few considering the adaptation for downstream visual perception tasks. Additionally, from the data compatibility perspective, pixel misalignment and adversarial attacks of image fusion are two major challenges. Additionally, integrating comprehensive semantic information for tasks like semantic segmentation, object detection, and salient object detection remains underexplored, posing a critical obstacle in image fusion.

preview
A classification sankey diagram containing typical fusion methods.

🔥🚀资源库 (Resource Library)

It covers all results of our survey paper, available for download from Baidu Cloud.

💥融合 (Fusion)
✂️分割 (Segmentation) Based on SegFormer
🔍检测 (Detection) Based on YOLO-v5
计算效率 (Computational Efficiency)

数据集(Datasets)

Dataset	Img pairs	Resolution	Color	Obj/Cats	Cha-Sc	Anno	DownLoad
TNO	261	768×576	❌	few	✔	❌	Link
RoadScene 🔥	221	Various	✔	medium	❌	❌	Link
VIFB	21	Various	Various	few	❌	❌	Link
MS	2999	768×576	✔	14146 / 6	❌	✔	Link
LLVIP	16836	1280×720	✔	pedestrian / 1	❌	✔	Link
M³FD 🔥	4200	1024×768	✔	33603 / 6	✔	✔	Link
MFNet	1569	640×480	✔	abundant / 8	❌	✔	Link
FMB 🔥	1500	800×600	✔	abundant / 14	❌	✔	Link

If the M³FD and FMB datasets are helpful to you, please cite the following paper:

@inproceedings{liu2022target,
  title={Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection},
  author={Liu, Jinyuan and Fan, Xin and Huang, Zhanbo and Wu, Guanyao and Liu, Risheng and Zhong, Wei and Luo, Zhongxuan},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  pages={5802--5811},
  year={2022}
}

@inproceedings{liu2023multi,
  title={Multi-interactive feature learning and a full-time multi-modality benchmark for image fusion and segmentation},
  author={Liu, Jinyuan and Liu, Zhu and Wu, Guanyao and Ma, Long and Liu, Risheng and Zhong, Wei and Luo, Zhongxuan and Fan, Xin},
  booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
  pages={8115--8124},
  year={2023}
}

方法集(Method Set)

纯融合方法(Fusion for Visual Enhancement)

Aspects (分类)	Methods (方法)	Title (标题)	Venue (发表场所)	Source (资源)
Auto-Encoder	DenseFuse	Densefuse: A fusion approach to infrared and visible images	TIP '18	Paper/Code
Auto-Encoder	SEDRFuse	Sedrfuse: A symmetric encoder–decoder with residual block network for infrared and visible image fusion	TIM '20	Paper/Code
Auto-Encoder	DIDFuse	Didfuse: Deep image decomposition for infrared and visible image fusion	IJCAI '20	Paper/Code
Auto-Encoder	MFEIF	Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion	TCSVT '21	Paper/Code
Auto-Encoder	RFN-Nest	Rfn-nest: An end-to-end residual fusion network for infrared and visible images	TIM '21	Paper/Code
Auto-Encoder	SFAFuse	Self-supervised feature adaption for infrared and visible image fusion	InfFus '21	Paper/Code
Auto-Encoder	SMoA	Smoa: Searching a modality-oriented architecture for infrared and visible image fusion	SPL '21	Paper/Code
Auto-Encoder	Re2Fusion	Res2fusion: Infrared and visible image fusion based on dense res2net and double nonlocal attention models	TIM '22	Paper/Code
GAN	FusionGAN	Fusiongan: A generative adversarial network for infrared and visible image fusion	InfFus '19	Paper/Code
GAN	DDcGAN	Learning a generative model for fusing infrared and visible images via conditional generative adversarial network with dual discriminators	TIP '19	Paper/Code
GAN	AtFGAN	Attentionfgan: Infrared and visible image fusion using attention-based generative adversarial networks	TMM '20	Paper
GAN	DPAL	Infrared and visible image fusion via detail preserving adversarial learning	InfFus '20	Paper/Code
GAN	D2WGAN	Infrared and visible image fusion using dual discriminators generative adversarial networks with wasserstein distance	InfSci '20	Paper
GAN	GANMcC	Ganmcc: A generative adversarial network with multiclassification constraints for infrared and visible image fusion	TIM '20	Paper/Code
GAN	ICAFusion	Infrared and visible image fusion via interactive compensatory attention adversarial learning	TMM '22	Paper/Code
GAN	TCGAN	Transformer based conditional gan for multimodal image fusion	TMM '23	Paper/Code
GAN	DCFusion	DCFusion: A Dual-Frequency Cross-Enhanced Fusion Network for Infrared and Visible Image Fusion	TIM '23	Paper
GAN	FreqGAN	Freqgan: Infrared and visible image fusion via unified frequency adversarial learning	TCSVT '24	Paper/Code
CNN	BIMDL	A bilevel integrated model with data-driven layer ensemble for multi-modality image fusion	TIP '20	Paper
CNN	MgAN-Fuse	Multigrained attention network for infrared and visible image fusion	TIM '20	Paper
CNN	AUIF	Efficient and model-based infrared and visible image fusion via algorithm unrolling	TCSVT '21	Paper/Code
CNN	RXDNFuse	Rxdnfuse: A aggregated residual dense network for infrared and visible image fusion	InfFus '21	Paper
CNN	STDFusionNet	Stdfusionnet: An infrared and visible image fusion network based on salient target detection	TIM '21	Paper/Code
CNN	CUFD	Cufd: An encoder–decoder network for visible and infrared image fusion based on common and unique feature decomposition	CVIU '22	Paper/Code
CNN	Dif-Fusion	Dif-fusion: Towards high color fidelity in infrared and visible image fusion with diffusion models	TIP '23	Paper/Code
CNN	L2Net	L2Net: Infrared and Visible Image Fusion Using Lightweight Large Kernel Convolution Network	TIP '23	Paper/Code
CNN	IGNet	Learning a graph neural network with cross modality interaction for image fusion	ACMMM '23	Paper/Code
CNN	LRRNet	Lrrnet: A novel representation learning guided fusion network for infrared and visible images	TPAMI '23	Paper/Code
CNN	MetaFusion	Metafusion: Infrared and visible image fusion via meta-feature embedding from object detection	CVPR '23	Paper/Code
CNN	PSFusion	Rethinking the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity	InfFus '23	Paper/Code
Transformer	SwinFusion	Swinfusion: Cross-domain long-range learning for general image fusion via swin transformer	JAS '22	Paper/Code
Transformer	YDTR	Ydtr: Infrared and visible image fusion via y-shape dynamic transformer	TMM '22	Paper/Code
Transformer	IFT	Image fusion transformer	ICIP '22	Paper/Code
Transformer	CDDFuse	Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion	CVPR '23	Paper/Code
Transformer	TGFuse	Tgfuse: An infrared and visible image fusion approach based on transformer and generative adversarial network	TIP '23	Paper/Code
Transformer	CMTFusion	Cross-modal transformers for infrared and visible image fusion	TCSVT '23	Paper/Code
Transformer	Text-IF	Text-if: Leveraging semantic text guidance for degradation-aware and interactive image fusion	CVPR '24	Paper/Code
Transformer	PromptF	Promptfusion: Harmonized semantic prompt learning for infrared and visible image fusion	JAS '24
Transformer	MaeFuse	MaeFuse: Transferring Omni Features With Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training	TIP '25	Paper/Code

数据兼容方法(Data Compatible)

Aspects (分类)	Methods (方法)	Title (标题)	Venue (发表场所)	Source (资源)
Registration	UMIR	Unsupervised multi-modal image registration via geometry preserving image-to-image translation	CVPR ‘20	Paper/Code
Registration	ReCoNet	Reconet: Recurrent correction network for fast and efficient multi-modality image fusion	ECCV ‘22	Paper/Code
Registration	SuperFusion	Superfusion: A versatile image registration and fusion network with semantic awareness	JAS ‘22	Paper/Code
Registration	UMFusion	Unsupervised misaligned infrared and visible image fusion via cross-modality image generation and registration	IJCAI ‘22	Paper/Code
Registration	GCRF	General cross-modality registration framework for visible and infrared UAV target image registration	SR ‘23	Paper
Registration	MURF	MURF: mutually reinforcing multi-modal image registration and fusion	TPAMI ‘23	Paper/Code
Registration	SemLA	Semantics lead all: Towards unified image registration and fusion from a semantic perspective	InfFus ‘23	Paper/Code
Registration	-	A Deep Learning Framework for Infrared and Visible Image Fusion Without Strict Registration	IJCV ‘23	Paper
Attack	PAIFusion	PAIF: Perception-aware infrared-visible image fusion for attack-tolerant semantic segmentation	ACMMM ‘23	Paper/Code
General	FusionDN	FusionDN: A unified densely connected network for image fusion	AAAI ‘20	Paper/Code
General	IFCNN	IFCNN: A general image fusion framework based on convolutional neural network	InfFus ‘20	Paper/Code
General	PMGI	Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity	AAAI ‘20	Paper/Code
General	U2Fusion	U2Fusion: A unified unsupervised image fusion network	TPAMI ‘20	Paper/Code
General	SDNet	SDNet: A versatile squeeze-and-decomposition network for real-time image fusion	IJCV ‘21	Paper/Code
General	CoCoNet	CoCoNet: Coupled contrastive learning network with multi-level feature ensemble for multi-modality image fusion	IJCV ‘23	Paper/Code
General	DDFM	DDFM: Denoising diffusion model for multi-modality image fusion	ICCV ‘23	Paper/Code
General	EMMA	Equivariant multi-modality image fusion	CVPR ‘24	Paper/Code
General	FILM	Image fusion via vision-language model	ICML ‘24	Paper/Code
General	VDMUFusion	VDMUFusion: A Versatile Diffusion Model-Based Unsupervised Framework for Image Fusion	TIP ‘24	Paper/Code

面向应用方法(Application-oriented)

Aspects (分类)	Methods (方法)	Title (标题)	Venue (发表场所)	Source (资源)
Perception	DetFusion	A detection-driven infrared and visible image fusion network	ACMMM ‘22	Paper/Code
Perception	SeAFusion	Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network	InfFus ‘22	Paper/Code
Perception	TarDAL	Target-aware dual adversarial learning and a multi-scenario multimodality benchmark to fuse infrared and visible for object detection	CVPR ‘22	Paper/Code
Perception	BDLFusion	Bi-level dynamic learning for jointly multi-modality image fusion and beyond	IJCAI ‘23	Paper/Code
Perception	IRFS	An interactively reinforced paradigm for joint infrared-visible image fusion and saliency object detection	InfFus ‘23	Paper/Code
Perception	MetaFusion	Metafusion: Infrared and visible image fusion via meta-feature embedding from object detection	CVPR ‘23	Paper/Code
Perception	MoE-Fusion	Multi-modal gated mixture of local-to-global experts for dynamic image fusion	ICCV ‘23	Paper/Code
Perception	SegMiF	Multi-interactive feature learning and a full-time multimodality benchmark for image fusion and segmentation	ICCV ‘23	Paper/Code
Perception	CAF	Where elegance meets precision: Towards a compact, automatic, and flexible framework for multi-modality image fusion and applications	IJCAI ‘24	Paper/Code
Perception	MRFS	Mrfs: Mutually reinforcing image fusion and segmentation	CVPR ‘24	Paper/Code
Perception	TIMFusion	A task-guided, implicitly searched and meta-initialized deep model for image fusion	TPAMI ‘24	Paper/Code
Perception	SAGE	Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond	CVPR ‘25	Paper/Code

评价指标(Evaluation Metric)

We integrated the code for calculating metrics and used GPU acceleration with PyTorch, significantly improving the speed of computing metrics across multiple methods and images.
You can find it at Metric

If you want to calculate metrics using our code, you can run:

# Please modify the data path in 'eval_torch.py'.
python eval_torch.py

资源库(Resource Library)

融合(Fusion)

Fusion images from multiple datasets in the IVIF domain are organized in the following form: each subfolder contains fusion images generated by different methods, facilitating research and comparison for users.

Fusion ROOT
├── IVIF
|   ├── FMB
|   |   ├── ... 
|   |   ├── CAF # All the file names are named after the methods
|   |   └── ...
|   ├── # The other files follow the same structure shown above.
|   ├── M3FD_300 # Mini version of M3FD dataset with 300 images
|   ├── RoadScene
|   ├── TNO
|   └── M3FD_4200.zip # Full version of the M3FD dataset with 4200 images

You can directly download from here.

Download：Baidu Yun

分割(Segmentation)

Segmentation data is organized in the following form: it contains multiple directories to facilitate the management of segmentation-related data and results.

Segmentation ROOT
├── Segformer
|   ├── datasets
|   |   ├── ... 
|   |   ├── CAF # All the file names are named after the methods
|   |   |    └──VOC2007
|   |   |         ├── JPEGImages # Fusion result images in JPG format
|   |   |         └── SegmentationClass # Ground truth for segmentation
|   |   └── ... # The other files follow the same structure shown above.
|   ├── model_data 
|   |   ├── backbone # Backbone used for segmentation
|   |   └── model # Saved model files
|   |        ├── ...
|   |        ├── CAF.pth # All the model names are named after the methods
|   |        └── ... 
|   ├── results # Saved model files and training results
|   |   ├── iou # IoU results for segmentation validation
|   |        ├── ...
|   |        ├── CAF.txt # All the file names are named after the methods
|   |        └── ... 
|   |   └── predict #Visualization of segmentation
|   |        ├── ...
|   |        ├── CAF # All the file names are named after the methods
|   |        └── ... 
|   └── hyperparameters.md # Hyperparameter settings

You can directly download from here.

Download：Baidu Yun

检测(Detection)

Detection data is organized in the following form:
it contains multiple directories to facilitate the management of detection-related data and results.

Detection ROOT
├── M3FD
|   ├── Fused Results
|   |   ├── ... 
|   |   ├── CAF # All the file names are named after the methods
|   |   |   ├── Images # Fusion result images in PNG format
|   |   |   └── Labels # Ground truth for detection
|   |   └── ... # The other files follow the same structure shown above.
|   ├── model_data 
|   |   └── model # Saved model files
|   |        ├── ...
|   |        ├── CAF.pth # All the model names are named after the methods
|   |        └── ... 
|   ├── results # Saved model files and training results
|   |   └── predict #Visualization of detection
|   |        ├── ...
|   |        ├── CAF # All the file names are named after the methods
|   |        └── ... 
|   └── hyperparameters.md # Hyperparameter settings

You can directly download from here.

Download：Baidu Yun

计算效率(Computational Efficiency)

FLOPS and Params:
- We utilize the profile function from the thop package to compute the FLOPs (G) and Params (M) counts of the model.

from thop import profile

# Create ir, vi input tensor
ir = torch.randn(1, 1, 1024, 768).to(device)
vi = torch.randn(1, 3, 1024, 768).to(device)
# Assume 'model' is your network model
flops, params = profile(model, inputs=(ir, vi))

Time:
- To measure the Time (ms) of the model, we exclude the initial image to compute the average while testing a random selection of 10 image sets from the M3FD dataset, each with a resolution of 1024×768, on the Nvidia GeForce 4090. To eliminate CPU influence, we employ CUDA official event functions to measure running time on the GPU.

import torch
  
# Create CUDA events
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)
# Record the start time
start.record()
# Execute the model
# Assume 'model' is your network model
fus = model(ir, vi)   
# Record the end time
end.record()