Skip to content

HorizonRobotics/EmbodiedGen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

25 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence

๐ŸŒ Project Page ๐Ÿ“„ arXiv ๐ŸŽฅ Video ๐Ÿค— Hugging Face ๐Ÿค— Hugging Face ๐Ÿค— Hugging Face ไธญๆ–‡ไป‹็ป

EmbodiedGen is a generative engine to create diverse and interactive 3D worlds composed of high-quality 3D assets(mesh & 3DGS) with plausible physics, leveraging generative AI to address the challenges of generalization in embodied intelligence related research. It composed of six key modules: Image-to-3D, Text-to-3D, Texture Generation, Articulated Object Generation, Scene Generation and Layout Generation.

Overall Framework


โœจ Table of Contents of EmbodiedGen

๐Ÿš€ Quick Start

โœ… Setup Environment

git clone https://github.com/HorizonRobotics/EmbodiedGen.git
cd EmbodiedGen
git checkout v0.1.3
git submodule update --init --recursive --progress
conda create -n embodiedgen python=3.10.13 -y # recommended to use a new env.
conda activate embodiedgen
bash install.sh basic

โœ… Starting from Docker

We provide a pre-built Docker image on Docker Hub with a configured environment for your convenience. For more details, please refer to Docker documentation.

Note: Model checkpoints are not included in the image, they will be automatically downloaded on first run. You still need to set up the GPT Agent manually.

IMAGE=wangxinjie/embodiedgen:env_v0.1.x
CONTAINER=EmbodiedGen-docker-${USER}
docker pull ${IMAGE}
docker run -itd --shm-size="64g" --gpus all --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --privileged --net=host --name ${CONTAINER} ${IMAGE}
docker exec -it ${CONTAINER} bash

โœ… Setup GPT Agent

Update the API key in file: embodied_gen/utils/gpt_config.yaml.

You can choose between two backends for the GPT agent:

  • gpt-4o (Recommended) โ€“ Use this if you have access to Azure OpenAI.
  • qwen2.5-vl โ€“ An alternative with free usage via OpenRouter, apply a free key here and update api_key in embodied_gen/utils/gpt_config.yaml (50 free requests per day)

๐Ÿ–ผ๏ธ Image-to-3D

๐Ÿค— Hugging Face Generate physically plausible 3D asset URDF from single input image, offering high-quality support for digital twin systems. (HF space is a simplified demonstration. For the full functionality, please refer to img3d-cli.) Image to 3D

โ˜๏ธ Service

Run the image-to-3D generation service locally. Models downloaded automatically on first run, please be patient.

# Run in foreground
python apps/image_to_3d.py
# Or run in the background
CUDA_VISIBLE_DEVICES=0 nohup python apps/image_to_3d.py > /dev/null 2>&1 &

โšก API

Generate physically plausible 3D assets from image input via the command-line API.

img3d-cli --image_path apps/assets/example_image/sample_04.jpg apps/assets/example_image/sample_19.jpg \
    --n_retry 2 --output_root outputs/imageto3d

# See result(.urdf/mesh.obj/mesh.glb/gs.ply) in ${output_root}/sample_xx/result

๐Ÿ“ Text-to-3D

๐Ÿค— Hugging Face Create 3D assets from text descriptions for a wide range of geometry and styles. (HF space is a simplified demonstration. For the full functionality, please refer to text3d-cli.)

Text to 3D

โ˜๏ธ Service

Deploy the text-to-3D generation service locally.

Text-to-image model based on the Kolors model, supporting Chinese and English prompts. Models downloaded automatically on first run, please be patient.

python apps/text_to_3d.py

โšก API

Text-to-image model based on SD3.5 Medium, English prompts only. Usage requires agreement to the model license(click accept), models downloaded automatically.

For large-scale 3D assets generation, set --n_pipe_retry=2 to ensure high end-to-end 3D asset usability through automatic quality check and retries. For more diverse results, do not set --seed_img.

text3d-cli --prompts "small bronze figurine of a lion" "A globe with wooden base" "wooden table with embroidery" \
    --n_image_retry 2 --n_asset_retry 2 --n_pipe_retry 1 --seed_img 0 \
    --output_root outputs/textto3d

Text-to-image model based on the Kolors model.

bash embodied_gen/scripts/textto3d.sh \
    --prompts "small bronze figurine of a lion" "A globe with wooden base and latitude and longitude lines" "ๆฉ™่‰ฒ็”ตๅŠจๆ‰‹้’ป๏ผŒๆœ‰็ฃจๆŸ็ป†่Š‚" \
    --output_root outputs/textto3d_k

ps: models with more permissive licenses found in embodied_gen/models/image_comm_model.py


๐ŸŽจ Texture Generation

๐Ÿค— Hugging Face Generate visually rich textures for 3D mesh.

Texture Gen

โ˜๏ธ Service

Run the texture generation service locally. Models downloaded automatically on first run, see download_kolors_weights, geo_cond_mv.

python apps/texture_edit.py

โšก API

Support Chinese and English prompts.

bash embodied_gen/scripts/texture_gen.sh \
    --mesh_path "apps/assets/example_texture/meshes/robot_text.obj" \
    --prompt "ไธพ็€็‰Œๅญ็š„ๅ†™ๅฎž้ฃŽๆ ผๆœบๅ™จไบบ๏ผŒๅคง็œผ็›๏ผŒ็‰ŒๅญไธŠๅ†™็€โ€œHelloโ€็š„ๆ–‡ๅญ—" \
    --output_root "outputs/texture_gen/robot_text"

bash embodied_gen/scripts/texture_gen.sh \
    --mesh_path "apps/assets/example_texture/meshes/horse.obj" \
    --prompt "A gray horse head with flying mane and brown eyes" \
    --output_root "outputs/texture_gen/gray_horse"

๐ŸŒ 3D Scene Generation

scene3d

โšก API

Run bash install.sh extra to install additional requirements if you need to use scene3d-cli.

It takes ~30mins to generate a color mesh and 3DGS per scene.

CUDA_VISIBLE_DEVICES=0 scene3d-cli \
--prompts "Art studio with easel and canvas" \
--output_dir outputs/bg_scenes/ \
--seed 0 \
--gs3d.max_steps 4000 \
--disable_pano_check

โš™๏ธ Articulated Object Generation

๐Ÿšง Coming Soon

articulate


๐Ÿž๏ธ Layout(Interactive 3D Worlds) Generation

๐Ÿ’ฌ Generate Layout from task description

layout1 layout2
layout3 layout4

Text-to-image model based on SD3.5 Medium, usage requires agreement to the model license. All models auto-downloaded at the first run.

You can generate any desired room as background using scene3d-cli. As each scene takes approximately 30 minutes to generate, we recommend pre-generating them for efficiency and adding them to outputs/bg_scenes/scene_list.txt.

We provided some sample background assets created with scene3d-cli. Download them(~4G) using hf download xinjjj/scene3d-bg --repo-type dataset --local-dir outputs.

Generating one interactive 3D scene from task description with layout-cli takes approximately 30 minutes.

layout-cli --task_descs "Place the pen in the mug on the desk" "Put the fruit on the table on the plate" \
--bg_list "outputs/bg_scenes/scene_list.txt" --output_root "outputs/layouts_gen" --insert_robot
Iscene_demo1 Iscene_demo2

Run multiple tasks defined in task_list.txt in the backend. Remove --insert_robot if you don't consider the robot pose in layout generation.

CUDA_VISIBLE_DEVICES=0 nohup layout-cli \
--task_descs "apps/assets/example_layout/task_list.txt" \
--bg_list "outputs/bg_scenes/scene_list.txt" \
--output_root "outputs/layouts_gens" --insert_robot > layouts_gens.log &

Using compose_layout.py, you can recompose the layout of the generated interactive 3D scenes. (Support for texture editing and augmentation will be added later.)

python embodied_gen/scripts/compose_layout.py \
--layout_path "outputs/layouts_gens/task_0000/layout.json" \
--output_dir "outputs/layouts_gens/task_0000/recompose" --insert_robot

We provide sim-cli, that allows users to easily load generated layouts into an interactive 3D simulation using the SAPIEN engine (will support for more simulators in future updates).

sim-cli --layout_path "outputs/layouts_gens/task_0000/recompose/layout.json" \
--output_dir "outputs/layouts_gens/task_0000/recompose/sapien_render" --robot_name "franka"

๐Ÿ–ผ๏ธ Real-to-Sim Digital Twin

real2sim_mujoco


For Developer

pip install -e .[dev] && pre-commit install
python -m pytest # Pass all unit-test are required.

๐Ÿ“š Citation

If you use EmbodiedGen in your research or projects, please cite:

@misc{wang2025embodiedgengenerative3dworld,
      title={EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence},
      author={Xinjie Wang and Liu Liu and Yu Cao and Ruiqi Wu and Wenkang Qin and Dehui Wang and Wei Sui and Zhizhong Su},
      year={2025},
      eprint={2506.10600},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2506.10600},
}

๐Ÿ™Œ Acknowledgement

EmbodiedGen builds upon the following amazing projects and models: ๐ŸŒŸ Trellis | ๐ŸŒŸ Hunyuan-Delight | ๐ŸŒŸ Segment Anything | ๐ŸŒŸ Rembg | ๐ŸŒŸ RMBG-1.4 | ๐ŸŒŸ Stable Diffusion x4 | ๐ŸŒŸ Real-ESRGAN | ๐ŸŒŸ Kolors | ๐ŸŒŸ ChatGLM3 | ๐ŸŒŸ Aesthetic Score | ๐ŸŒŸ Pano2Room | ๐ŸŒŸ Diffusion360 | ๐ŸŒŸ Kaolin | ๐ŸŒŸ diffusers | ๐ŸŒŸ gsplat | ๐ŸŒŸ QWEN-2.5VL | ๐ŸŒŸ GPT4o | ๐ŸŒŸ SD3.5


โš–๏ธ License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

About

Towards a Generative 3D World Engine for Embodied Intelligence

Resources

License

Stars

Watchers

Forks