synthetic computer vision

A list of synthetic dataset and tools for computer vision

1014

180

Python

Synthetic for Computer Vision

This is a repo for tracking the progress of using synthetic images for computer vision research. If you found any important work is missing or information is not up-to-date, please edit this file directly and make a pull request. Each publication is tagged with a keyword to make it easier to search.

If you find anything missing from this page, please edit this README.md file to add it. When adding a new item, you can simply follow the format of existing items. How this document is structured is documented in contribute.md.

How to use: Click publication to jump to the paper title, detailed information such as code and project page will be provided together with pdf file.**

Synthetic image dataset

SunCG (Princeton)
Minos
House3d (Facebook)
Procedural Human Action Videos (PHAV)
SURREAL
Virtual KITTI
Synthia
Sintel, A synthetic dataset for optical flow
SceneFlow
4D Light Fields
ICL-NUIM dataset
Driving in the Matrix
Playing for Benchmarks

3D Model Repository

Realistic 3D models are critical for creating realistic and diverse virtual worlds. Here are research efforts for creating 3D model repositories.

ShapeNet
3dscan
seeing3Dchairs

Tools

AIPlayground: UE4 Based Data Ablation tool, see project page
AirSim (Microsoft)
CARLA (Intel)
Unity ML agents
Render SMPL human bodies on Blender, see CVPR2017
Render for CNN, based on Blender, see ICCV2015
UETorch, based on UE4, see ICML2016
UnrealCV, based on UE4, see ArXiv
VizDoom, based on Doom, see ArXiv
OpenAI Universe, see project page
Blender addon for 4D light field rendering, see project page
Event-Camera Dataset and Simulator see project page
NVIDIA Deep learning Dataset Synthesizer (NDDS)

Resources

ECCV 2016 Workshop Virtual/Augmented Reality for Visual Artificial Intelligence (VARVAI) workshop

ICCV 2017 Workshop Role of Simulation in Computer Vision

Virtual Reality Meets Physical Reality:
Modelling and Simulating Virtual Humans and Environments
Siggraph Asia 2016 workshop

CVPR 2017 Workshop THOR Challenge

Misc.

RealismCNN github
Abnormality Detection in Images(http://paul.rutgers.edu/~babaks/abnormality_detection.html)

Reference

2020

Mousavi, Mehdi and Khanal, Aashis and Estrada, Rolando. “AI Playground: Unreal Engine-based Data Ablation Tool for Deep Learning” International Symposium on Visual Computing (ISVC), 2020.
(pdf)
(project)

2017

(Total=12)

Adversarially Tuned Scene Generation
(pdf)
UE4Sim: A Photo-Realistic Simulator for Computer Vision Applications
(pdf)
(project)

Playing for Benchmarks
(pdf)

A Self-supervised Learning System for Object Detection using Physics Simulation and Multi-view Pose Estimation

(:octocat:code)
(pdf)
(project)

Procedural Generation of Videos to Train Deep Action Recognition Networks
(pdf)
(project)
(citation:8)

Learning from Synthetic Humans

(:octocat:code)
(pdf)
(project)
tag: synthetic human
Nvidia Issac
Configurable, Photorealistic Image Rendering and Ground Truth Synthesis by Sampling Stochastic Grammars Representing Indoor Scenes

Aerial Informatics and Robotics Platform

(:octocat:code)
(pdf)
(project)
tag: tool

Tobin, Josh, et al. “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World.” arXiv preprint arXiv:1703.06907 (2017). tag: domain
(pdf)

M. Johnson-Roberson, C. Barto, R. Mehta, S. N. Sridhar, Karl Rosaen,and R. Vasudevan, “Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks?,” in IEEE International Conference on Robotics and Automation, pp. 1–8, 2017.

(:octocat:code)
(pdf)
(project)
(citation:3)

Zheng Z, Zheng L, Yang Y. “Unlabeled samples generated by gan improve the person re-identification baseline in vitro” in Proceedings of IEEE International Conference on Computer Vision, 2017.

(:octocat:code)
(pdf)
(citation:48)
tag: generated images by GAN

2016

(Total=17)

Sadeghi, Fereshteh, and Sergey Levine. “rl: Real single-image flight without a single real image. arXiv preprint.” arXiv preprint arXiv:1611.04201 12 (2016). tag: rl
Johnson, Justin, et al. “CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning.” arXiv preprint arXiv:1612.06890 (2016).
(pdf)
McCormac, John, et al. “SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth.” arXiv preprint arXiv:1612.05079 (2016).
de Souza, César Roberto, et al. “Procedural Generation of Videos to Train Deep Action Recognition Networks.” arXiv preprint arXiv:1612.00881 (2016).
(pdf)
(project)
tag: synthetic human
Synnaeve, Gabriel, et al. “TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games.” arXiv preprint arXiv:1611.00625 (2016).
(pdf)
(code)
Lin, Jenny, et al. “A virtual reality platform for dynamic human-scene interaction.” SIGGRAPH ASIA 2016 Virtual Reality meets Physical Reality: Modelling and Simulating Virtual Humans and Environments. ACM, 2016.
(pdf)
(project)
Mahendran, A., et al. “ResearchDoom and CocoDoom: Learning Computer Vision with Games.” arXiv preprint arXiv:1610.02431 (2016).
(pdf)
(project)

The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes. 2016
(pdf)
(project)
(citation:4)

Virtual Worlds as Proxy for Multi-Object Tracking Analysis. 2016
(pdf)
(project)
(citation:5)
Playing for data: Ground truth from computer games. 2016
(pdf)
(citation:1)
Play and Learn: Using Video Games to Train Computer Vision Models. 2016
(pdf)
(citation:1)
ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning. 2016
(:octocat:code)
(pdf)
(project)
(citation:4)

A large dataset of object scans. 2016
(pdf)
(project)
(citation:6)

UnrealCV: Connecting Computer Vision to Unreal Engine 2016

(:octocat:code)
(project)
(pdf)

Learning Physical Intuition of Block Towers by Example 2016
(:octocat:code)
(pdf)
(citation:12)
Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning 2016
(pdf)

A Dataset and Evaluation Methodology for Depth Estimation on 4D Light Fields. ACCV 2016
(:octocat:code)
(pdf)
(project)
(citation)

2015

(Total=3)

A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. 2015
(pdf)
(citation:9)

Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views. 2015
(:octocat:code)
(pdf)
(citation:33)

Shapenet: An information-rich 3d model repository. 2015
(pdf)
(project)
(citation:27)

2014

(Total=2)

Virtual and real world adaptation for pedestrian detection. 2014
(pdf)
(citation:46)

Seeing 3d chairs: exemplar part-based 2d-3d alignment using a large dataset of cad models. 2014
(:octocat:code)
(pdf)
(project)
(citation:110)

Handa, Ankur, Thomas Whelan, John McDonald, and Andrew J. Davison. “A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM.” In Robotics and automation (ICRA), 2014 IEEE international conference on, pp. 1524-1531. IEEE, 2014.
(project)

2013

(Total=1)

Detailed 3d representations for object recognition and modeling. 2013
(pdf)
(citation:67)

2012

(Total=1)

A naturalistic open source movie for optical flow evaluation. 2012
(pdf)
(project)
(citation:227)

2010

(Total=1)

Learning appearance in virtual scenarios for pedestrian detection. 2010
(pdf)
(citation:79)

2007

(Total=1)

Ovvv: Using virtual worlds to design and evaluate surveillance systems. 2007
(pdf)
(citation:58)