Scenic: A Jax Library for Computer Vision Research and Beyond
Scenic is a codebase with a focus on research around attention-based models
for computer vision. Scenic has been successfully used to develop
classification, segmentation, and detection models for multiple modalities
including images, video, audio, and multimodal combinations of them.
More precisely, Scenic is a (i) set of shared light-weight libraries solving
tasks commonly encountered tasks when training large-scale (i.e. multi-device,
multi-host) vision models; and (ii) several projects containing fully
fleshed out problem-specific training and evaluation loops using these
libraries.
Scenic is developed in JAX and uses
Flax.
Among others Scenic provides
There are some SOTA models and baselines in Scenic which were either developed
using Scenic, or have been reimplemented in Scenic:
Projects that were developed in Scenic or used it for their experiments:
More information can be found in projects.
Baselines that were reproduced in Scenic:
More information can be found in baseline models.
Scenic aims to facilitate rapid prototyping of large-scale vision models. To
keep the code simple to understand and extend we prefer forking and
copy-pasting over adding complexity or increasing abstraction. Only when
functionality proves to be widely useful across many models and tasks it may be
upstreamed to Scenic’s shared libraries.
projects/baselines/README.md
for a walk-through baseline models andYou will need Python 3.9 or later. Download the code from GitHub
$ git clone https://github.com/google-research/scenic.git
$ cd scenic
$ pip install .
and run training for ViT on ImageNet:
$ python scenic/main.py -- \
--config=scenic/projects/baselines/configs/imagenet/imagenet_vit_config.py \
--workdir=./
Note that for specific projects and baselines, you might need to install extra
packages that are mentioned in their README.md
or requirements.txt
files.
Here
is also a minimal colab to train a simple feed-forward model using Scenic.
Scenic is designed to propose different levels of abstraction, to support
hosting projects that only require changing hyper-parameters by defining config
files, to those that need customization on the input pipeline, model
architecture, losses and metrics, and the training loop. To make this happen,
the code in Scenic is organized as either project-level code,
which refers to customized code for specific projects or baselines or
library-level code, which refers to common functionalities and general
patterns that are adapted by the majority of projects. The project-level
code lives in the projects
directory.
The goal is to keep the library-level code minimal and well-tested and to avoid
introducing extra abstractions to support minor use-cases. Shared libraries
provided by Scenic are split into:
dataset_lib
: Implements IO pipelines for loading and pre-processing datamodel_lib
: Provides
ClassificationModel
orSegmentationModel
in model_lib.base_models
) with task-specificmodel_lib.layers
, focusing on efficientmodel_lib.matchers
.train_lib
: Provides tools for constructing training loops and implementscommon_lib
: General utilities, like logging and debugging modules,Scenic supports the development of customized solutions for customized tasks and
data via the concept of “project”. There is no one-fits-all recipe for how much
code should be re-used by a project. Projects can consist of only configs and
use the common models, trainers, task/data that live in library-level code, or
they can simply fork any of the mentioned functionalities and redefine, layers,
losses, metrics, logging methods, tasks, architectures, as well as training and
evaluation loops. The modularity of library-level code makes it flexible for
projects to fall placed on any spot in the “run-as-is” to “fully customized”
spectrum.
Common baselines such as a ResNet and Vision Transformer (ViT) are implemented
in the projects/baselines
project. Forking models in this directory is a good starting point for new
projects.
If you use Scenic, you can cite our white paper.
Here is an example BibTeX entry:
@InProceedings{dehghani2021scenic,
author = {Dehghani, Mostafa and Gritsenko, Alexey and Arnab, Anurag and Minderer, Matthias and Tay, Yi},
title = {Scenic: A JAX Library for Computer Vision Research and Beyond},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2022},
pages = {21393-21398}
}
Disclaimer: This is not an official Google product.