Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. Cosmos is purpose built for physical AI. The Cosmos repository will enable end users to run the Cosmos models, run inference scripts and generate videos.
NVIDIA Cosmos is a developer-first world foundation model platform designed to help Physical AI developers build their Physical AI systems better and faster. Cosmos contains
Details of the platform is described in the Cosmos paper. Preview access is avaiable at build.nvidia.com.
Model name | Description | Try it out |
---|---|---|
Cosmos-1.0-Diffusion-7B-Text2World | Text to visual world generation | Inference |
Cosmos-1.0-Diffusion-14B-Text2World | Text to visual world generation | Inference |
Cosmos-1.0-Diffusion-7B-Video2World | Video + Text based future visual world generation | Inference |
Cosmos-1.0-Diffusion-14B-Video2World | Video + Text based future visual world generation | Inference |
Cosmos-1.0-Autoregressive-4B | Future visual world generation | Inference |
Cosmos-1.0-Autoregressive-12B | Future visual world generation | Inference |
Cosmos-1.0-Autoregressive-5B-Video2World | Video + Text based future visual world generation | Inference |
Cosmos-1.0-Autoregressive-13B-Video2World | Video + Text based future visual world generation | Inference |
Cosmos-1.0-Guardrail | Guardrail contains pre-Guard and post-Guard for safe use | Embedded in model inference scripts |
Follow the Cosmos Installation Guide to setup the docker. For inference with the pretrained models, please refer to Cosmos Diffusion Inference and Cosmos Autoregressive Inference.
The code snippet below provides a gist of the inference usage.
PROMPT="A sleek, humanoid robot stands in a vast warehouse filled with neatly stacked cardboard boxes on industrial shelves. \
The robot's metallic body gleams under the bright, even lighting, highlighting its futuristic design and intricate joints. \
A glowing blue light emanates from its chest, adding a touch of advanced technology. The background is dominated by rows of boxes, \
suggesting a highly organized storage system. The floor is lined with wooden pallets, enhancing the industrial setting. \
The camera remains static, capturing the robot's poised stance amidst the orderly environment, with a shallow depth of \
field that keeps the focus on the robot while subtly blurring the background for a cinematic effect."
# Example using 7B model
PYTHONPATH=$(pwd) python cosmos1/models/diffusion/inference/text2world.py \
--checkpoint_dir checkpoints \
--diffusion_transformer_dir Cosmos-1.0-Diffusion-7B-Text2World \
--prompt "$PROMPT" \
--offload_prompt_upsampler \
--video_save_name Cosmos-1.0-Diffusion-7B-Text2World
We also offer multi-GPU inference support for Diffusion Text2World WFM models through NeMo Framework.
NeMo Framework provides GPU accelerated post-training with general post-training for both diffusion and autoregressive models, with other types of post-training coming soon.
This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.
NVIDIA Cosmos source code is released under the Apache 2 License.
NVIDIA Cosmos models are released under the NVIDIA Open Model License. For a custom license, please contact [email protected].