Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption. A fire-new survey for infrared and visible image fusion.
[2024-12-12] Our survey paper [Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption.] has been accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence!
(Paper)(ไธญๆ็)
Welcome to IVIF Zoo, a comprehensive repository dedicated to Infrared and Visible Image Fusion (IVIF). Based on our survey paper [Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption. Jinyuan Liu, Guanyao Wu, Zhu Liu, Di Wang, Zhiying Jiang, Long Ma, Wei Zhong, Xin Fan, Risheng Liu*], this repository aims to serve as a central hub for researchers, engineers, and enthusiasts in the field of IVIF. Here, youโll find a wide array of resources, tools, and datasets, curated to accelerate advancements and foster collaboration in infrared-visible image fusion technologies.
A detailed spectrogram depicting almost all wavelength and frequency ranges, particularly expanding the range of the human visual system and annotating corresponding computer vision and image fusion datasets.
The diagram of infrared and visible image fusion for practical applications. Existing image fusion methods majorly focus on the design of architectures and training strategies for visual enhancement, few considering the adaptation for downstream visual perception tasks. Additionally, from the data compatibility perspective, pixel misalignment and adversarial attacks of image fusion are two major challenges. Additionally, integrating comprehensive semantic information for tasks like semantic segmentation, object detection, and salient object detection remains underexplored, posing a critical obstacle in image fusion.
A classification sankey diagram containing typical fusion methods.
It covers all results of our survey paper, available for download from Baidu Cloud.
Based on SegFormer
Based on YOLO-v5
Dataset | Img pairs | Resolution | Color | Obj/Cats | Cha-Sc | Anno | DownLoad |
---|---|---|---|---|---|---|---|
TNO | 261 | 768ร576 | โ | few | โ | โ | Link |
RoadScene ๐ฅ | 221 | Various | โ | medium | โ | โ | Link |
VIFB | 21 | Various | Various | few | โ | โ | Link |
MS | 2999 | 768ร576 | โ | 14146 / 6 | โ | โ | Link |
LLVIP | 16836 | 1280ร720 | โ | pedestrian / 1 | โ | โ | Link |
M3FD ๐ฅ | 4200 | 1024ร768 | โ | 33603 / 6 | โ | โ | Link |
MFNet | 1569 | 640ร480 | โ | abundant / 8 | โ | โ | Link |
FMB ๐ฅ | 1500 | 800ร600 | โ | abundant / 14 | โ | โ | Link |
If the M3FD and FMB datasets are helpful to you, please cite the following paper:
@inproceedings{liu2022target,
title={Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection},
author={Liu, Jinyuan and Fan, Xin and Huang, Zhanbo and Wu, Guanyao and Liu, Risheng and Zhong, Wei and Luo, Zhongxuan},
booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
pages={5802--5811},
year={2022}
}
@inproceedings{liu2023multi,
title={Multi-interactive feature learning and a full-time multi-modality benchmark for image fusion and segmentation},
author={Liu, Jinyuan and Liu, Zhu and Wu, Guanyao and Ma, Long and Liu, Risheng and Zhong, Wei and Luo, Zhongxuan and Fan, Xin},
booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
pages={8115--8124},
year={2023}
}
Aspects (ๅ็ฑป) |
Methods (ๆนๆณ) |
Title (ๆ ้ข) |
Venue (ๅ่กจๅบๆ) |
Source (่ตๆบ) |
---|---|---|---|---|
Auto-Encoder | DenseFuse | Densefuse: A fusion approach to infrared and visible images | TIP '18 | Paper/Code |
Auto-Encoder | SEDRFuse | Sedrfuse: A symmetric encoderโdecoder with residual block network for infrared and visible image fusion | TIM '20 | Paper/Code |
Auto-Encoder | DIDFuse | Didfuse: Deep image decomposition for infrared and visible image fusion | IJCAI '20 | Paper/Code |
Auto-Encoder | MFEIF | Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion | TCSVT '21 | Paper/Code |
Auto-Encoder | RFN-Nest | Rfn-nest: An end-to-end residual fusion network for infrared and visible images | TIM '21 | Paper/Code |
Auto-Encoder | SFAFuse | Self-supervised feature adaption for infrared and visible image fusion | InfFus '21 | Paper/Code |
Auto-Encoder | SMoA | Smoa: Searching a modality-oriented architecture for infrared and visible image fusion | SPL '21 | Paper/Code |
Auto-Encoder | Re2Fusion | Res2fusion: Infrared and visible image fusion based on dense res2net and double nonlocal attention models | TIM '22 | Paper/Code |
GAN | FusionGAN | Fusiongan: A generative adversarial network for infrared and visible image fusion | InfFus '19 | Paper/Code |
GAN | DDcGAN | Learning a generative model for fusing infrared and visible images via conditional generative adversarial network with dual discriminators | TIP '19 | Paper/Code |
GAN | AtFGAN | Attentionfgan: Infrared and visible image fusion using attention-based generative adversarial networks | TMM '20 | Paper |
GAN | DPAL | Infrared and visible image fusion via detail preserving adversarial learning | InfFus '20 | Paper/Code |
GAN | D2WGAN | Infrared and visible image fusion using dual discriminators generative adversarial networks with wasserstein distance | InfSci '20 | Paper |
GAN | GANMcC | Ganmcc: A generative adversarial network with multiclassification constraints for infrared and visible image fusion | TIM '20 | Paper/Code |
GAN | ICAFusion | Infrared and visible image fusion via interactive compensatory attention adversarial learning | TMM '22 | Paper/Code |
GAN | TCGAN | Transformer based conditional gan for multimodal image fusion | TMM '23 | Paper/Code |
GAN | DCFusion | DCFusion: A Dual-Frequency Cross-Enhanced Fusion Network for Infrared and Visible Image Fusion | TIM '23 | Paper |
GAN | FreqGAN | Freqgan: Infrared and visible image fusion via unified frequency adversarial learning | TCSVT '24 | Paper/Code |
CNN | BIMDL | A bilevel integrated model with data-driven layer ensemble for multi-modality image fusion | TIP '20 | Paper |
CNN | MgAN-Fuse | Multigrained attention network for infrared and visible image fusion | TIM '20 | Paper |
CNN | AUIF | Efficient and model-based infrared and visible image fusion via algorithm unrolling | TCSVT '21 | Paper/Code |
CNN | RXDNFuse | Rxdnfuse: A aggregated residual dense network for infrared and visible image fusion | InfFus '21 | Paper |
CNN | STDFusionNet | Stdfusionnet: An infrared and visible image fusion network based on salient target detection | TIM '21 | Paper/Code |
CNN | CUFD | Cufd: An encoderโdecoder network for visible and infrared image fusion based on common and unique feature decomposition | CVIU '22 | Paper/Code |
CNN | Dif-Fusion | Dif-fusion: Towards high color fidelity in infrared and visible image fusion with diffusion models | TIP '23 | Paper/Code |
CNN | L2Net | L2Net: Infrared and Visible Image Fusion Using Lightweight Large Kernel Convolution Network | TIP '23 | Paper/Code |
CNN | IGNet | Learning a graph neural network with cross modality interaction for image fusion | ACMMM '23 | Paper/Code |
CNN | LRRNet | Lrrnet: A novel representation learning guided fusion network for infrared and visible images | TPAMI '23 | Paper/Code |
CNN | MetaFusion | Metafusion: Infrared and visible image fusion via meta-feature embedding from object detection | CVPR '23 | Paper/Code |
CNN | PSFusion | Rethinking the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity | InfFus '23 | Paper/Code |
Transformer | SwinFusion | Swinfusion: Cross-domain long-range learning for general image fusion via swin transformer | JAS '22 | Paper/Code |
Transformer | YDTR | Ydtr: Infrared and visible image fusion via y-shape dynamic transformer | TMM '22 | Paper/Code |
Transformer | IFT | Image fusion transformer | ICIP '22 | Paper/Code |
Transformer | CDDFuse | Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion | CVPR '23 | Paper/Code |
Transformer | TGFuse | Tgfuse: An infrared and visible image fusion approach based on transformer and generative adversarial network | TIP '23 | Paper/Code |
Transformer | CMTFusion | Cross-modal transformers for infrared and visible image fusion | TCSVT '23 | Paper/Code |
Transformer | Text-IF | Text-if: Leveraging semantic text guidance for degradation-aware and interactive image fusion | CVPR '24 | Paper/Code |
Transformer | PromptF | Promptfusion: Harmonized semantic prompt learning for infrared and visible image fusion | JAS '24 | |
Transformer | MaeFuse | MaeFuse: Transferring Omni Features With Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training | TIP '25 | Paper/Code |
Aspects (ๅ็ฑป) |
Methods (ๆนๆณ) |
Title (ๆ ้ข) |
Venue (ๅ่กจๅบๆ) |
Source (่ตๆบ) |
---|---|---|---|---|
Registration | UMIR | Unsupervised multi-modal image registration via geometry preserving image-to-image translation | CVPR โ20 | Paper/Code |
Registration | ReCoNet | Reconet: Recurrent correction network for fast and efficient multi-modality image fusion | ECCV โ22 | Paper/Code |
Registration | SuperFusion | Superfusion: A versatile image registration and fusion network with semantic awareness | JAS โ22 | Paper/Code |
Registration | UMFusion | Unsupervised misaligned infrared and visible image fusion via cross-modality image generation and registration | IJCAI โ22 | Paper/Code |
Registration | GCRF | General cross-modality registration framework for visible and infrared UAV target image registration | SR โ23 | Paper |
Registration | MURF | MURF: mutually reinforcing multi-modal image registration and fusion | TPAMI โ23 | Paper/Code |
Registration | SemLA | Semantics lead all: Towards unified image registration and fusion from a semantic perspective | InfFus โ23 | Paper/Code |
Registration | - | A Deep Learning Framework for Infrared and Visible Image Fusion Without Strict Registration | IJCV โ23 | Paper |
Attack | PAIFusion | PAIF: Perception-aware infrared-visible image fusion for attack-tolerant semantic segmentation | ACMMM โ23 | Paper/Code |
General | FusionDN | FusionDN: A unified densely connected network for image fusion | AAAI โ20 | Paper/Code |
General | IFCNN | IFCNN: A general image fusion framework based on convolutional neural network | InfFus โ20 | Paper/Code |
General | PMGI | Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity | AAAI โ20 | Paper/Code |
General | U2Fusion | U2Fusion: A unified unsupervised image fusion network | TPAMI โ20 | Paper/Code |
General | SDNet | SDNet: A versatile squeeze-and-decomposition network for real-time image fusion | IJCV โ21 | Paper/Code |
General | CoCoNet | CoCoNet: Coupled contrastive learning network with multi-level feature ensemble for multi-modality image fusion | IJCV โ23 | Paper/Code |
General | DDFM | DDFM: Denoising diffusion model for multi-modality image fusion | ICCV โ23 | Paper/Code |
General | EMMA | Equivariant multi-modality image fusion | CVPR โ24 | Paper/Code |
General | FILM | Image fusion via vision-language model | ICML โ24 | Paper/Code |
General | VDMUFusion | VDMUFusion: A Versatile Diffusion Model-Based Unsupervised Framework for Image Fusion | TIP โ24 | Paper/Code |
Aspects (ๅ็ฑป) |
Methods (ๆนๆณ) |
Title (ๆ ้ข) |
Venue (ๅ่กจๅบๆ) |
Source (่ตๆบ) |
---|---|---|---|---|
Perception | DetFusion | A detection-driven infrared and visible image fusion network | ACMMM โ22 | Paper/Code |
Perception | SeAFusion | Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network | InfFus โ22 | Paper/Code |
Perception | TarDAL | Target-aware dual adversarial learning and a multi-scenario multimodality benchmark to fuse infrared and visible for object detection | CVPR โ22 | Paper/Code |
Perception | BDLFusion | Bi-level dynamic learning for jointly multi-modality image fusion and beyond | IJCAI โ23 | Paper/Code |
Perception | IRFS | An interactively reinforced paradigm for joint infrared-visible image fusion and saliency object detection | InfFus โ23 | Paper/Code |
Perception | MetaFusion | Metafusion: Infrared and visible image fusion via meta-feature embedding from object detection | CVPR โ23 | Paper/Code |
Perception | MoE-Fusion | Multi-modal gated mixture of local-to-global experts for dynamic image fusion | ICCV โ23 | Paper/Code |
Perception | SegMiF | Multi-interactive feature learning and a full-time multimodality benchmark for image fusion and segmentation | ICCV โ23 | Paper/Code |
Perception | CAF | Where elegance meets precision: Towards a compact, automatic, and flexible framework for multi-modality image fusion and applications | IJCAI โ24 | Paper/Code |
Perception | MRFS | Mrfs: Mutually reinforcing image fusion and segmentation | CVPR โ24 | Paper/Code |
Perception | TIMFusion | A task-guided, implicitly searched and meta-initialized deep model for image fusion | TPAMI โ24 | Paper/Code |
Perception | SAGE | Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond | CVPR โ25 | Paper/Code |
We integrated the code for calculating metrics and used GPU acceleration with PyTorch, significantly improving the speed of computing metrics across multiple methods and images.
You can find it at Metric
If you want to calculate metrics using our code, you can run:
# Please modify the data path in 'eval_torch.py'.
python eval_torch.py
Fusion images from multiple datasets in the IVIF domain are organized in the following form: each subfolder contains fusion images generated by different methods, facilitating research and comparison for users.
Fusion ROOT
โโโ IVIF
| โโโ FMB
| | โโโ ...
| | โโโ CAF # All the file names are named after the methods
| | โโโ ...
| โโโ # The other files follow the same structure shown above.
| โโโ M3FD_300 # Mini version of M3FD dataset with 300 images
| โโโ RoadScene
| โโโ TNO
| โโโ M3FD_4200.zip # Full version of the M3FD dataset with 4200 images
You can directly download from here.
Download๏ผBaidu Yun
Segmentation data is organized in the following form: it contains multiple directories to facilitate the management of segmentation-related data and results.
Segmentation ROOT
โโโ Segformer
| โโโ datasets
| | โโโ ...
| | โโโ CAF # All the file names are named after the methods
| | | โโโVOC2007
| | | โโโ JPEGImages # Fusion result images in JPG format
| | | โโโ SegmentationClass # Ground truth for segmentation
| | โโโ ... # The other files follow the same structure shown above.
| โโโ model_data
| | โโโ backbone # Backbone used for segmentation
| | โโโ model # Saved model files
| | โโโ ...
| | โโโ CAF.pth # All the model names are named after the methods
| | โโโ ...
| โโโ results # Saved model files and training results
| | โโโ iou # IoU results for segmentation validation
| | โโโ ...
| | โโโ CAF.txt # All the file names are named after the methods
| | โโโ ...
| | โโโ predict #Visualization of segmentation
| | โโโ ...
| | โโโ CAF # All the file names are named after the methods
| | โโโ ...
| โโโ hyperparameters.md # Hyperparameter settings
You can directly download from here.
Download๏ผBaidu Yun
Detection data is organized in the following form:
it contains multiple directories to facilitate the management of detection-related data and results.
Detection ROOT
โโโ M3FD
| โโโ Fused Results
| | โโโ ...
| | โโโ CAF # All the file names are named after the methods
| | | โโโ Images # Fusion result images in PNG format
| | | โโโ Labels # Ground truth for detection
| | โโโ ... # The other files follow the same structure shown above.
| โโโ model_data
| | โโโ model # Saved model files
| | โโโ ...
| | โโโ CAF.pth # All the model names are named after the methods
| | โโโ ...
| โโโ results # Saved model files and training results
| | โโโ predict #Visualization of detection
| | โโโ ...
| | โโโ CAF # All the file names are named after the methods
| | โโโ ...
| โโโ hyperparameters.md # Hyperparameter settings
You can directly download from here.
Download๏ผBaidu Yun
profile
function from the thop
package to compute the FLOPs (G) and Params (M) counts of the model.from thop import profile
# Create ir, vi input tensor
ir = torch.randn(1, 1, 1024, 768).to(device)
vi = torch.randn(1, 3, 1024, 768).to(device)
# Assume 'model' is your network model
flops, params = profile(model, inputs=(ir, vi))
import torch
# Create CUDA events
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)
# Record the start time
start.record()
# Execute the model
# Assume 'model' is your network model
fus = model(ir, vi)
# Record the end time
end.record()