Unofficial Implementation of VistaDream: Sampling multiview consistent images for single-view scene reconstruction

VistaDream is a novel framework for reconstructing 3D scenes from single-view images using Flux-based diffusion models. This implementation combines image outpainting, depth estimation, and 3D Gaussian splatting for high-quality 3D scene generation, with integrated visualization using Rerun.

Uses Rerun for 3D visualization, Gradio for interactive UI, Flux for diffusion-based outpainting, and Pixi for easy installation.

Overview

VistaDream addresses the challenge of 3D scene reconstruction from a single image through a novel two-stage pipeline:

Coarse 3D Scaffold Construction: Creates a global scene structure by outpainting image boundaries and estimating depth maps
Multi-view Consistency Sampling (MCS): Uses iterative diffusion-based RGB-D inpainting with multi-view consistency constraints to generate high-quality novel views

The framework integrates multiple state-of-the-art models:

Flux diffusion models for high-quality image outpainting and inpainting
3D Gaussian Splatting for efficient 3D scene representation
Rerun for real-time 3D visualization and debugging

Installation

Prerequisites

Linux only with NVIDIA GPU (CUDA 12.8)
Pixi package manager

Using Pixi

git clone https://github.com/rerun-io/vistadream.git
cd vistadream
pixi run example

This will automatically download the required models and run the example with the included office image.

Usage

Full VistaDream Pipeline - 3D Scene Reconstruction ⚠️ Under Construction

Generate a complete 3D scene from a single image with outpainting, depth estimation, and Gaussian splatting:

pixi run python tools/run_vistadream.py --image-path data/office/IMG_4029.jpg --expansion-percent 0.2 --n-frames 10

Note: The full 3D reconstruction pipeline is currently under active development. Some features may be experimental or incomplete.

Single Image Processing

Process a single image with depth estimation and basic 3D reconstruction:

pixi run python tools/run_single_img.py --image-path data/office/IMG_4029.jpg

Flux Outpainting Only

Run just the outpainting component with Rerun visualization:

pixi run python tools/run_flux_outpainting.py --image-path data/office/IMG_4029.jpg --expansion-percent 0.2

Gradio Web Interface

Launch an interactive web interface for experimenting with the models:

pixi run python tools/gradio_app.py

Key Features

Single Image to 3D: Complete pipeline from single image to navigable 3D scene
Memory Efficient: Model offloading support for GPU memory management
Real-time Visualization: Integrated Rerun viewer for 3D scene inspection
Training-free: No fine-tuning required for existing diffusion models
High Quality: Multi-view consistency sampling ensures coherent 3D reconstruction

Project Structure

├── src/vistadream/
│   ├── api/                 # High-level pipeline APIs
│   │   ├── flux_outpainting.py    # Outpainting-only pipeline
│   │   └── vistadream_pipeline.py # Full 3D reconstruction pipeline
│   ├── flux/                # Flux diffusion model integration
│   │   ├── cli_*.py         # Command-line interfaces
│   │   ├── model.py         # Flux transformer architecture
│   │   ├── sampling.py      # Diffusion sampling logic
│   │   └── util.py          # Model loading and configuration
│   └── ops/                 # Core operations
│       ├── flux.py          # Flux model wrappers
│       ├── gs/              # Gaussian splatting implementation
│       ├── trajs/           # Camera trajectory generation
│       └── visual_check.py  # 3D scene validation tools
└── tools/                   # Standalone applications
    ├── gradio_app.py        # Web interface
    ├── run_flux_outpainting.py
    ├── run_vistadream.py    # Main 3D pipeline
    └── run_single_img.py    # Single image processing

Model Checkpoints

Models are automatically downloaded from Hugging Face on first run. Manual download:

pixi run huggingface-cli download pablovela5620/vistadream --local-dir ckpt/

Expected structure:

ckpt/
├── flux_fill/
│   ├── flux1-fill-dev.safetensors
│   └── ae.safetensors
├── vec.pt
├── txt.pt
└── txt_256.pt

Citation

Thanks to the original authors! If you use VistaDream in your research, please cite:

Original Repo

@inproceedings{wang2025vistadream,
  title={VistaDream: Sampling multiview consistent images for single-view scene reconstruction},
  author={Wang, Haiping and Liu, Yuan and Liu, Ziwei and Wang, Wenping and Dong, Zhen and Yang, Bisheng},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2025}
}

Acknowledgements

This project builds upon several outstanding works:

Flux - Black Forest Labs for the diffusion model foundation
3D Gaussian Splatting - Inria for efficient 3D representation
Rerun - Rerun.io for 3D visualization framework
GSplat - Nerfstudio for Gaussian splatting implementation
MoGe - Microsoft Research for monocular geometry estimation

Related Work

ASUKA - Enhanced image inpainting for mitigating unwanted object insertion
MoGe - Accurate monocular geometry estimation for open-domain images

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github		.github
.vscode		.vscode
data/office		data/office
media		media
notebooks		notebooks
src/vistadream		src/vistadream
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
pixi.lock		pixi.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Unofficial Implementation of VistaDream: Sampling multiview consistent images for single-view scene reconstruction

Overview

Installation

Prerequisites

Using Pixi

Usage

Full VistaDream Pipeline - 3D Scene Reconstruction ⚠️ Under Construction

Single Image Processing

Flux Outpainting Only

Gradio Web Interface

Key Features

Project Structure

Model Checkpoints

Citation

Acknowledgements

Related Work

About

Uh oh!

Releases

Packages

Uh oh!

Languages

rerun-io/vistadream

Folders and files

Latest commit

History

Repository files navigation

Unofficial Implementation of VistaDream: Sampling multiview consistent images for single-view scene reconstruction

Overview

Installation

Prerequisites

Using Pixi

Usage

Full VistaDream Pipeline - 3D Scene Reconstruction ⚠️ Under Construction

Single Image Processing

Flux Outpainting Only

Gradio Web Interface

Key Features

Project Structure

Model Checkpoints

Citation

Acknowledgements

Related Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages