Unofficial Implementation of VistaDream: Sampling multiview consistent images for single-view scene reconstruction
VistaDream is a novel framework for reconstructing 3D scenes from single-view images using Flux-based diffusion models. This implementation combines image outpainting, depth estimation, and 3D Gaussian splatting for high-quality 3D scene generation, with integrated visualization using Rerun.
Uses Rerun for 3D visualization, Gradio for interactive UI, Flux for diffusion-based outpainting, and Pixi for easy installation.
VistaDream addresses the challenge of 3D scene reconstruction from a single image through a novel two-stage pipeline:
- Coarse 3D Scaffold Construction: Creates a global scene structure by outpainting image boundaries and estimating depth maps
- Multi-view Consistency Sampling (MCS): Uses iterative diffusion-based RGB-D inpainting with multi-view consistency constraints to generate high-quality novel views
The framework integrates multiple state-of-the-art models:
- Flux diffusion models for high-quality image outpainting and inpainting
- 3D Gaussian Splatting for efficient 3D scene representation
- Rerun for real-time 3D visualization and debugging
- Linux only with NVIDIA GPU (CUDA 12.8)
- Pixi package manager
git clone https://github.com/rerun-io/vistadream.git
cd vistadream
pixi run example
This will automatically download the required models and run the example with the included office image.
Generate a complete 3D scene from a single image with outpainting, depth estimation, and Gaussian splatting:
pixi run python tools/run_vistadream.py --image-path data/office/IMG_4029.jpg --expansion-percent 0.2 --n-frames 10
Note: The full 3D reconstruction pipeline is currently under active development. Some features may be experimental or incomplete.
Process a single image with depth estimation and basic 3D reconstruction:
pixi run python tools/run_single_img.py --image-path data/office/IMG_4029.jpg
Run just the outpainting component with Rerun visualization:
pixi run python tools/run_flux_outpainting.py --image-path data/office/IMG_4029.jpg --expansion-percent 0.2
Launch an interactive web interface for experimenting with the models:
pixi run python tools/gradio_app.py
- Single Image to 3D: Complete pipeline from single image to navigable 3D scene
- Memory Efficient: Model offloading support for GPU memory management
- Real-time Visualization: Integrated Rerun viewer for 3D scene inspection
- Training-free: No fine-tuning required for existing diffusion models
- High Quality: Multi-view consistency sampling ensures coherent 3D reconstruction
├── src/vistadream/
│ ├── api/ # High-level pipeline APIs
│ │ ├── flux_outpainting.py # Outpainting-only pipeline
│ │ └── vistadream_pipeline.py # Full 3D reconstruction pipeline
│ ├── flux/ # Flux diffusion model integration
│ │ ├── cli_*.py # Command-line interfaces
│ │ ├── model.py # Flux transformer architecture
│ │ ├── sampling.py # Diffusion sampling logic
│ │ └── util.py # Model loading and configuration
│ └── ops/ # Core operations
│ ├── flux.py # Flux model wrappers
│ ├── gs/ # Gaussian splatting implementation
│ ├── trajs/ # Camera trajectory generation
│ └── visual_check.py # 3D scene validation tools
└── tools/ # Standalone applications
├── gradio_app.py # Web interface
├── run_flux_outpainting.py
├── run_vistadream.py # Main 3D pipeline
└── run_single_img.py # Single image processing
Models are automatically downloaded from Hugging Face on first run. Manual download:
pixi run huggingface-cli download pablovela5620/vistadream --local-dir ckpt/
Expected structure:
ckpt/
├── flux_fill/
│ ├── flux1-fill-dev.safetensors
│ └── ae.safetensors
├── vec.pt
├── txt.pt
└── txt_256.pt
Thanks to the original authors! If you use VistaDream in your research, please cite:
@inproceedings{wang2025vistadream,
title={VistaDream: Sampling multiview consistent images for single-view scene reconstruction},
author={Wang, Haiping and Liu, Yuan and Liu, Ziwei and Wang, Wenping and Dong, Zhen and Yang, Bisheng},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2025}
}
This project builds upon several outstanding works:
- Flux - Black Forest Labs for the diffusion model foundation
- 3D Gaussian Splatting - Inria for efficient 3D representation
- Rerun - Rerun.io for 3D visualization framework
- GSplat - Nerfstudio for Gaussian splatting implementation
- MoGe - Microsoft Research for monocular geometry estimation