Skip to content

worldbench/survey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 

Repository files navigation

Awesome Logo Visitors PR's Welcome

😎 Awesome 3D and 4D World Models

This survey reviews 3D and 4D world models - models that learn, predict, and simulate the geometry and dynamics of real environments from multi-modal signals. We unify terminology, scope, and evaluations, and organize the space into three complementary paradigms by representation: VideoGen (image/video-centric), OccGen (occupancy-centric), and LiDARGen (point-cloud-centric).

For more details, kindly refer to our paper and project page. 🚀

📚 Citation

If you find this work helpful for your research, please kindly consider citing our paper:

@article{survey_3d_4d_world_model,
    title   = {3D and 4D World Modeling: A Survey},
    author  = {Lingdong Kong and Wesley Yang and Jianbiao Mei and Youquan Liu and Ao Liang and Dekai Zhu and Dongyue Lu and Wei Yin and Xiaotao Hu and Mingkai Jia and Junyuan Deng and Kaiwen Zhang and Yang Wu and Tianyi Yan and Shenyuan Gao and Song Wang and Linfeng Li and Liang Pan and Yong Liu and Jianke Zhu and Wei Tsang Ooi and Steven C.H. Hoi and Ziwei Liu},
    journal = {arXiv preprint arXiv:2509.xxxxx},
    year    = {2025},
}

Table of Contents

1. World Modeling from Video Generation

1️⃣ Data Engines

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
BEVControl arXiv
BEVControl: Accurately Controlling Street-View Elements with Multi-Perspective Consistency via BEV Sketch Layout
arXiv 2023 - -
BEVGen arXiv
Street-View Image Generation from a Bird's-Eye View Layout
RA-L 2024 Website GitHub
MagicDrive arXiv
MagicDrive: Street View Generation with Diverse 3D Geometry Control
ICLR 2024 Website GitHub
Panacea arXiv
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
CVPR 2024 Website GitHub
DrivingDiffusion arXiv
DrivingDiffusion: Layout-Guided Multi-View Driving Scene Video Generation with Latent Diffusion Model
ECCV 2024 Website GitHub
WoVoGen arXiv
WoVoGen: World Volume-Aware Diffusion for Controllable Multi-Camera Driving Scene Generation
ECCV 2024 - GitHub
Delphi arXiv
Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation
arXiv 2024 Website GitHub
SimGen arXiv
SimGen: Simulator-conditioned Driving Scene Generation
NeurIPS 2024 Website GitHub
BEVWorld arXiv
BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents
arXiv 2024 - -
Panacea+ arXiv
Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving
arXiv 2024 Website -
DiVE arXiv
DiVE: DiT-Based Video Generation with Enhanced Control
arXiv 2024 Website GitHub
SyntheOcc arXiv
SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs
arXiv 2024 Website GitHub
HoloDrive arXiv
HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving
arXiv 2024 - -
CogDriving arXiv
Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention
arXiv 2024 Website -
UniMLVG arXiv
UniMLVG: Unified Framework for Multi-View Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving
arXiv 2024 - GitHub
DrivePhysica arXiv
Physical Informed Driving World Model
arXiv 2024 Website -
DriveDreamer-2 arXiv
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation
AAAI 2025 Website GitHub
SubjectDrive arXiv
SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control
AAAI 2025 Website -
Glad arXiv
Glad: A Streaming Scene Generator for Autonomous Driving
ICLR 2025 - GitHub
DualDiff arXiv
DualDiff: Dual-Branch Diffusion Model for Autonomous Driving with Semantic Fusion
ICRA 2025 - GitHub
UniScene arXiv
UniScene: Unified Occupancy-Centric Driving Scene Generation
CVPR 2025 Website GitHub
DriveScape arXiv
DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation
CVPR 2025 Website -
PerLDiff arXiv
PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models
ICCV 2025 Website GitHub
MagicDrive-V2 arXiv
MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control
ICCV 2025 Website -
Cosmos-Transfer1 arXiv
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control
arXiv 2025 Website GitHub
DualDiff+ arXiv
DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance
arXiv 2025 - GitHub
CoGen arXiv
CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving
arXiv 2025 Website -
NoiseController arXiv
NoiseController: Towards Consistent Multi-View Video Generation via Noise Decomposition and Collaboration
arXiv 2025 - -
STAGE arXiv
STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation
arXiv 2025 - -

2️⃣ Action Interpreters

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
GAIA-1 arXiv
GAIA-1: A Generative World Model for Autonomous Driving
arXiv 2023 Website -
ADriver-I arXiv
ADriver-I: A General World Model for Autonomous Driving
arXiv 2023 - -
Drive-WM arXiv
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
CVPR 2024 Website GitHub
DriveDreamer arXiv
DriveDreamer: Towards Real-World-Driven World Models for Autonomous Driving
ECCV 2024 Website GitHub
GenAD arXiv
GenAD: Generalized Predictive Model for Autonomous Driving
ECCV 2024 - GitHub
Vista arXiv
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability
NeurIPS 2024 Website GitHub
InfinityDrive arXiv
InfinityDrive: Breaking Time Limits in Driving World Models
arXiv 2024 Website -
DrivingGPT arXiv
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-Modal Autoregressive Transformers
arXiv 2024 Website -
DrivingWorld arXiv
DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT
arXiv 2024 Website GitHub
GEM arXiv
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control
CVPR 2025 Website GitHub
MaskGWM arXiv
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
CVPR 2025 - GitHub
Epona arXiv
Epona: Autoregressive Diffusion World Model for Autonomous Driving
ICCV 2025 Website GitHub
VaViM & VaVAM arXiv
VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
arXiv 2025 Website GitHub
MiLA arXiv
MiLA: Multi-View Intensive-Fidelity Long-Term Video Generation World Model for Autonomous Driving
arXiv 2025 - GitHub
GAIA-2 arXiv
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving
arXiv 2025 Website -
DriVerse arXiv
DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment
arXiv 2025 - -
PosePilot arXiv
PosePilot: Steering Camera Pose for Generative World Models with Self-Supervised Depth
arXiv 2025 - -
ProphetDWM arXiv
ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos
arXiv 2025 - -
LongDWM arXiv
LongDWM: Cross-Granularity Distillation for Building A Long-Term Driving World Model
arXiv 2025 Website GitHub

3️⃣ Neural Simulators

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
MagicDrive3D arXiv
MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes
arXiv 2024 Website GitHub
DreamForge arXiv
DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes
arXiv 2024 Website GitHub
Doe-1 arXiv
Doe-1: Closed-Loop Autonomous Driving with Large World Model
arXiv 2024 Website GitHub
DrivingSphere arXiv
DrivingSphere: Building A High-Fidelity 4D World for Closed-Loop Simulation
CVPR 2025 Website GitHub
UMGen arXiv
Generating Multimodal Driving Scenes via Next-Scene Prediction
CVPR 2025 Website GitHub
DriveArena arXiv
DriveArena: A Closed-Loop Generative Simulation Platform for Autonomous Driving
ICCV 2025 Website GitHub
InfiniCube arXiv
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
ICCV 2025 Website GitHub
DiST-4D arXiv
DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation
ICCV 2025 Website GitHub
UniFuture arXiv
Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception
arXiv 2025 Website GitHub
Nexus arXiv
Decoupled Diffusion Sparks Adaptive Scene Generation
arXiv 2025 Website GitHub
Challenger arXiv
Challenger: Affordable Adversarial Driving Video Generation
arXiv 2025 Website GitHub
Cosmos-Drive arXiv
Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models
arXiv 2025 Website GitHub

4️⃣ Scene Reconstructors

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
3DGS arXiv
3D Gaussian Splatting for Real-Time Radiance Field Rendering
TOG 2023 Website GitHub
StreetGaussian arXiv
Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting
ECCV 2024 Website GitHub
4DGF arXiv
Dynamic 3D Gaussian Fields for Urban Areas
NeurIPS 2024 Website GitHub
SCube arXiv
SCube: Instant Large-Scale Scene Reconstruction using VoxSplats
NeurIPS 2024 Website GitHub
HUGS arXiv
HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting
CVPR 2024 Website GitHub
MagicDrive3D arXiv
MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes
arXiv 2024 Website GitHub
S3Gaussian arXiv
S3Gaussian: Self-Supervised Street Gaussians for Autonomous Driving
arXiv 2024 Website GitHub
VDG arXiv
VDG: Vision-Only Dynamic Gaussian for Driving Simulation
arXiv 2024 Website GitHub
UniGaussian arXiv
UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations
arXiv 2024 - -
Stag-1 arXiv
Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model
arXiv 2024 Website GitHub
DrivingRecon arXiv
DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving
arXiv 2024 - GitHub
OccScene arXiv
OccScene: Semantic Occupancy-Based Cross-Task Mutual Learning for 3D Scene Generation
arXiv 2024 - -
SGD arXiv
SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior
WACV 2025 - -
OmniRe arXiv
OmniRe: Omni Urban Scene Reconstruction
ICLR 2025 Website GitHub
DriveDreamer4D arXiv
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation
CVPR 2025 Website GitHub
DeSiRe-GS arXiv
DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes
CVPR 2025 - GitHub
SplatAD arXiv
SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving
CVPR 2025 Website GitHub
ReconDreamer arXiv
ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration
CVPR 2025 Website GitHub
FreeSim arXiv
FreeSim: Toward Free-Viewpoint Camera Simulation in Driving Scenes
CVPR 2025 Website -
StreetCrafter arXiv
StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models
CVPR 2025 Website GitHub
FlexDrive arXiv
FlexDrive: Toward Trajectory Flexibility in Driving Scene Reconstruction and Rendering
CVPR 2025 - -
S-NeRF++ arXiv
S-NeRF++: Autonomous Driving Simulation via Neural Reconstruction and Generation
TPAMI 2025 - -
InfiniCube arXiv
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
ICCV 2025 Website GitHub
DiST-4D arXiv
Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation
ICCV 2025 Website GitHub
DreamDrive arXiv
DreamDrive: Generative 4D Scene Modeling from Street View Images
arXiv 2025 Website -
Uni-Gaussians arXiv
Uni-Gaussians: Unifying Camera and Lidar Simulation with Gaussians for Dynamic Driving Scenarios
arXiv 2025 Website -
MuDG arXiv
MuDG: Taming Multi-Modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction
arXiv 2025 Website GitHub
UniFuture arXiv
Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception
arXiv 2025 Website GitHub
SceneCrafter arXiv
Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving Humanoid Robots
arXiv 2025 - GitHub
ReconDreamer++ arXiv
ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation
arXiv 2025 Website GitHub
RealEngine arXiv
RealEngine: Simulating Autonomous Driving in Realistic Context
arXiv 2025 - GitHub
GeoDrive arXiv
GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control
arXiv 2025 - GitHub
PseudoSimulation arXiv
Pseudo-Simulation for Autonomous Driving
arXiv 2025 - GitHub
Dreamland arXiv
Dreamland: Controllable World Creation with Simulator and Generative Models
arXiv 2025 Website -

2. World Modeling from Occupancy Generation

1️⃣ Scene Representors

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
SSD arXiv
Diffusion Probabilistic Models for Scene-Scale 3D Categorical Data
arXiv 2023 - GitHub
SemCity arXiv
SemCity: Semantic Scene Generation with Triplane Diffusion
CVPR 2024 Website GitHub
WoVoGen arXiv
WoVoGen: World Volume-Aware Diffusion for Controllable Multi-Camera Driving Scene Generation
ECCV 2024 - GitHub
UrbanDiff arXiv
Urban Scene Diffusion through Semantic Occupancy Map
arXiv 2024 Website -
DrivingSphere arXiv
DrivingSphere: Building A High-Fidelity 4D World for Closed-Loop Simulation
CVPR 2025 Website GitHub
UniScene arXiv
UniScene: Unified Occupancy-Centric Driving Scene Generation
CVPR 2025 Website GitHub
OccScene arXiv
OccScene: Semantic Occupancy-Based Cross-Task Mutual Learning for 3D Scene Generation
arXiv 2024 - -
InfiniCube arXiv
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
ICCV 2025 Website GitHub
Control-3D-Scene arXiv
Controllable 3D Outdoor Scene Generation via Scene Graphs
ICCV 2025 Website GitHub
X-Scene arXiv
X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability
arXiv 2025 Website GitHub

2️⃣ Occupancy Forecasters

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
Emergent-Occ arXiv
Differentiable Raycasting for Self-supervised Occupancy Forecasting
ECCV 2022 - GitHub
FF4D arXiv
Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting
CVPR 2023 Website GitHub
UniWorld arXiv
UniWorld: Autonomous Driving Pre-Training via World Models
arXiv 2023 - -
UniScene arXiv
UniScene: Multi-Camera Unified Pre-Training via 3D Scene Reconstruction for Autonomous Driving
arXiv 2023 - GitHub
OccWorld arXiv
OccWorld: Learning A 3D Occupancy World Model for Autonomous Driving
ECCV 2024 Website GitHub
Cam4DOcc arXiv
Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
CVPR 2024 - GitHub
DriveWorld arXiv
DriveWorld: 4D Pre-Trained Scene Understanding via World Models for Autonomous Driving
CVPR 2024 - -
OccSora arXiv
OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving
arXiv 2024 Website GitHub
UnO arXiv
UnO: Unsupervised Occupancy Fields for Perception and Forecasting
CVPR 2024 Website -
LOPR arXiv
Self-Supervised Multi-Future Occupancy Forecasting for Autonomous Driving
arXiv 2024 - -
FSF-Net arXiv
FSF-Net: Enhance 4D Occupancy Forecasting with Coarse BEV Scene Flow for Autonomous Driving
arXiv 2024 - -
OccLLaMA arXiv
OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving
arXiv 2024 - -
DOME arXiv
DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model
arXiv 2024 Website GitHub
GaussianAD arXiv
GaussianAD: Gaussian-Centric End-to-End Autonomous Driving
arXiv 2024 Website GitHub
DFIT-OccWorld arXiv
An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-Assisted Training
arXiv 2024 - -
Drive-OccWorld arXiv
Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving
AAAI 2025 Website GitHub
PreWorld arXiv
Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving
ICLR 2025 - GitHub
OccProphet arXiv
OccProphet: Pushing Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with Observer-Forecaster-Refiner Framework
ICLR 2025 - GitHub
RenderWorld arXiv
RenderWorld: World Model with Self-Supervised 3D Label
ICRA 2025 - -
Occ-LLM arXiv
Occ-LLM: Enhancing Autonomous Driving with Occupancy-Based Large Language Models
ICRA 2025 - -
EfficientOCF arXiv
Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting
CVPR 2025 - -
DIO arXiv
DIO: Decomposable Implicit 4D Occupancy-Flow World Model
CVPR 2025 - -
T³Former arXiv
Temporal Triplane Transformers as Occupancy World Models
arXiv 2025 - -
UniOcc arXiv
UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving
ICCV 2025 Website GitHub
I²World arXiv
I²-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting
ICCV 2025 - GitHub
COME arXiv
COME: Adding Scene-Centric Forecasting Control to Occupancy World Model
arXiv 2025 - GitHub

3️⃣ Autoregressive Simulators

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
SemCity arXiv
SemCity: Semantic Scene Generation with Triplane Diffusion
CVPR 2024 Website GitHub
XCube arXiv
XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies
CVPR 2024 Website GitHub
PDD arXiv
Pyramid Diffusion for Fine 3D Large Scene Generation
ECCV 2024 Website GitHub
OccSora arXiv
OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving
arXiv 2024 Website GitHub
DynamicCity arXiv
DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes
ICLR 2025 Website GitHub
DrivingSphere arXiv
DrivingSphere: Building A High-Fidelity 4D World for Closed-Loop Simulation
CVPR 2025 Website GitHub
InfiniCube arXiv
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models
ICCV 2025 Website GitHub
X-Scene arXiv
X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability
arXiv 2025 Website GitHub
PrITTI arXiv
PrITTI: Primitive-Based Generation of Controllable and Editable 3D Semantic Scenes
arXiv 2025 Website GitHub

3. World Modeling from LiDAR Generation

1️⃣ Data Engines

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
DUSty arXiv
Learning to Drop Points for LiDAR Scan Synthesis
IROS 2021 Website GitHub
LiDARGen arXiv
Learning to Generate Realistic LiDAR Point Clouds
ECCV 2022 - GitHub
DUSty v2 arXiv
Generative Range Imaging for Learning Scene Priors of 3D LiDAR Data
WACV 2023 Website GitHub
UltraLiDAR arXiv
UltraLiDAR: Learning Compact Representations for LiDAR Completion and Generation
CVPR 2023 Website -
Copilot4D arXiv
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion
ICLR 2024 Website -
R2DM arXiv
LiDAR Data Synthesis with Denoising Diffusion Probabilistic Models
ICRA 2024 Website GitHub
ViDAR arXiv
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
CVPR 2024 - GitHub
LiDiff arXiv
Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion
CVPR 2024 - GitHub
LiDM arXiv
Towards Realistic Scene Generation with LiDAR Diffusion Models
CVPR 2024 - GitHub
RangeLDM arXiv
RangeLDM: Fast Realistic LiDAR Point Cloud Generation
ECCV 2024 - GitHub
Text2LiDAR arXiv
Text2LiDAR: Text-Guided LiDAR Point Cloud Generation via Equirectangular Transformer
ECCV 2024 - GitHub
LiDARGRIT arXiv
Taming Transformers for Realistic Lidar Point Cloud Generation
arXiv 2024 - GitHub
BEVWorld arXiv
BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents
arXiv 2024 - GitHub
SDS arXiv
Simultaneous Diffusion Sampling for Conditional LiDAR Generation
arXiv 2024 - -
DiffSSC arXiv
DiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models
IROS 2025 - -
HoloDrive arXiv
HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving
arXiv 2024 - -
LOGen arXiv
LOGen: Toward Lidar Object Generation by Point Diffusion
arXiv 2024 Website GitHub
OLiDM arXiv
OLiDM: Object-Aware LiDAR Diffusion Models for Autonomous Driving
AAAI 2025 Website GitHub
X-Drive arXiv
X-Drive: Cross-Modality Consistent Multi-Sensor Data Synthesis for Driving Scenarios
ICLR 2025 - GitHub
LidarDM arXiv
LidarDM: Generative LiDAR Simulation in a Generated World
ICRA 2025 Website GitHub
LiDAR-EDIT arXiv
LiDAR-EDIT: LiDAR Data Generation by Editing the Object Layouts in Real-World Scenes
ICRA 2025 Website GitHub
R2Flow arXiv
Fast LiDAR Data Generation with Rectified Flows
ICRA 2025 Website GitHub
WeatherGen arXiv
WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion
CVPR 2025 - GitHub
LiDPM arXiv
LiDPM: Rethinking Point Diffusion for Lidar Scene Completion
IV 2025 Website GitHub
HERMES arXiv
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
ICCV 2025 Website GitHub
SuperPC arXiv
SuperPC: A Single Diffusion Model for Point Cloud Completion, Upsampling, Denoising, and Colorization
CVPR 2025 Website -
3DiSS arXiv
Towards Generating Realistic 3D Semantic Training Data for Autonomous Driving
arXiv 2025 - GitHub
Distill-DPO arXiv
Diffusion Distillation With Direct Preference Optimization For Efficient 3D LiDAR Scene Completion
arXiv 2025 - GitHub
DriveX arXiv
DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving
arXiv 2025 - -
OpenDWM arXiv
OpenDWM: Open Driving World Models
arXiv 2025 - GitHub
SPIRAL arXiv
SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation
arXiv 2025 - -
La La LiDAR arXiv
La La LiDAR: Large-Scale Layout Generation from LiDAR Data
arXiv 2025 - -
Veila arXiv
Veila: Panoramic LiDAR Generation from a Monocular RGB Image
arXiv 2025 - -
LiDARCrafter arXiv
LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences
arXiv 2025 Website GitHub

2️⃣ Action Forecasters

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
Copilot4D arXiv
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion
ICLR 2024 Website -
ViDAR arXiv
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
CVPR 2024 - GitHub
BEVWorld arXiv
BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents
arXiv 2024 - GitHub
HERMES arXiv
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
ICCV 2025 Website GitHub
DriveX arXiv
DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving
arXiv 2025 - -

3️⃣ Autoregressive Simulators

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website GitHub
HoloDrive arXiv
HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving
arXiv 2024 - -
LidarDM arXiv
LidarDM: Generative LiDAR Simulation in a Generated World
ICRA 2025 Website GitHub
OpenDWM arXiv
OpenDWM: Open Driving World Models
arXiv 2025 - GitHub
LiDARCrafter arXiv
LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences
arXiv 2025 Website GitHub

4. Datasets & Benchmarks

Datasets

⏲️ In chronological order, from the earliest to the latest.

Model Paper Venue Website
KITTI Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite CVPR 2012 Website
NYUv2 Indoor segmentation and support inference from RGBD images ECCV 2012 Website
CARLA arXiv
CARLA: An Open Urban Driving Simulator
CoRL 2017 Website
SemanticKITTI arXiv
SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences
ICCV 2019 Website
nuScenes arXiv
nuScenes: A multimodal dataset for autonomous driving
CVPR 2020 Website
Waymo Open arXiv
Scalability in Perception for Autonomous Driving: Waymo Open Dataset
CVPR 2020 Website
Seeing Through Fog arXiv
Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather
CVPR 2020 Website
Virtual KITTI 2 arXiv
Virtual KITTI 2
arXiv 2020 Website
Argoverse 2 arXiv
Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting
NeurIPS 2021 Website
Lyft-Level5 arXiv
One Thousand and One Hours: Self-driving Motion Prediction Dataset
CoRL 2021 Website
nuPlan arXiv
NuPlan: A closed-loop ML-based planning benchmark for autonomous vehicles
CVPRW 2021 Website
PandaSet arXiv
PandaSet: Advanced Sensor Suite Dataset for Autonomous Driving
ITSC 2022 Website
OpenCOOD arXiv
OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication
ICRA 2022 Website
KITTI-360 arXiv
KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D
TPAMI 2022 Website
CarlaSC arXiv
MotionSC: Data Set and Network for Real-Time Semantic Mapping in Dynamic Environments
RA-L 2022 Website
Robo3D arXiv
Robo3D: Towards Robust and Reliable 3D Perception against Corruptions
ICCV 2023 Website
OpenOccupancy arXiv
OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception
ICCV 2023 Website
Occ3D-nuScenes arXiv
Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving
NeurIPS 2023 Website
OpenDV-YouTube arXiv
GenAD: Generalized Predictive Model for Autonomous Driving
CVPR 2024 Website
SSCBench arXiv
SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving
IROS 2024 Website
NAVSIM arXiv
NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking
NeurIPS 2024 Website
DrivingDojo arXiv
DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model
NeurIPS 2024 Website
EUVS arXiv
Extrapolated Urban View Synthesis Benchmark
ICCV 2025 Website
Pi3DET arXiv
Perspective-Invariant 3D Object Detection
ICCV 2025 Website

Benchmarks

5. Applications

1️⃣ Autonomous Driving

Model Paper Venue Website GitHub
Occsora arXiv
Occsora: 4d occupancy generation models as world simulators for autonomous driving
Arxiv 2024 - GitHub
DFIT-OccWorld arXiv
An efficient occupancy world model via decoupled dynamic flow and image-assisted training
Arxiv2024 - -
LiDARCrafter arXiv
LiDARCrafter: Dynamic 4D world modeling from LiDAR sequences
Arxiv 2025 Website GitHub
UniSim arXiv
UniSim: A Neural Closed-Loop Sensor Simulator
CVPR 2023 Website -
Panacea arXiv
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
CVPR 2024 Website GitHub
Delphi arXiv
Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation
Arxiv 2024 Website GitHub
DriveDreamer-2 arXiv
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation
AAAI 2025 Website GitHub
Panacea+ arXiv
Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving
arXiv 2024 Website -
MiLA arXiv
MiLA: Multi-View Intensive-Fidelity Long-Term Video Generation World Model for Autonomous Driving
arXiv 2025 - GitHub
GAIA-2 arXiv
GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving
arXiv 2025 Website -

2️⃣ Robotics

Model Paper Venue Website GitHub
RoboDreamer arXiv
RoboDreamer: Learning Compositional World Models for Robot Imagination
Arxiv 2024 Website GitHub
BEHAVIOR arXiv
BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities
CoRL 2025 Website GitHub
Habitat 2.0 arXiv
Habitat 2.0: Training Home Assistants to Rearrange their Habitat
Arxiv 2021 - -
FMR arXiv
Foundation models in robotics: Applications, challenges, and the future
IJRR 2024 - GitHub
VLMPS arXiv
Visual Language Maps for Robot Navigation
ICRA 2023 Website GitHub

3️⃣ Video Games & XR

Model Paper Venue Website GitHub
ILVE arXiv
Interactive Latent Variable Evolution for the Generation of Minecraft Structures
ICFDG 2021 - -
ProcTHOR arXiv
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
NeurIPS 2022 Website GitHub
MGVQ arXiv
MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization
Avxiv 2025 Website GitHub
Text2World arXiv
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation
Arxiv 2025 Website GitHub
WorldGPT arXiv
WorldGPT: Empowering LLM as Multimodal World Model
ACM MM 2024 - GitHub

4️⃣ Digital Twins

Model Paper Venue Website GitHub
DynamicCity arXiv
DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes
ICLR 2025 Website GitHub
UrbanScene3D arXiv
Capturing, Reconstructing, and Simulating: the UrbanScene3D Datase
ECCV 2022 Website GitHub
GaussianCity arXiv
GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation
CVPR 2025 Website GitHub
UrbanWorld arXiv
UrbanWorld: An Urban World Model for 3D City Generation
Arxiv 2024 Website GitHub
SceneDiffuser++ arXiv
SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model
CVPR 2025 - -

6. Other Resources

Workshops

Tutorials

Talks & Seminars

7. Acknowledgements