RoboML is an aggregator package for quickly deploying open-source ML models for robots. It supports three main use cases:
- Rapid deployment of general-purpose models: Wraps around popular ML libraries like 🤗 Transformers, allowing fast deployment of models through scalable server endpoints.
- Deploy detection models with tracking: Supports deployment of all detection models in MMDetection with optional tracking integration.
- Aggregate robot-specific models from the robotics community: Intended as a platform for community-contributed multimodal models, usable in planning and control, especially with ROS components. See EmbodiedAgents.
| Model Class | Description | Default Checkpoint / Resource | Key Init Parameters |
|---|---|---|---|
TransformersLLM |
General-purpose large language model (LLM) from 🤗 Transformers | microsoft/Phi-3-mini-4k-instruct |
name, checkpoint, quantization, init_timeout |
TransformersMLLM |
Multimodal vision-language model (MLLM) from 🤗 Transformers | HuggingFaceM4/idefics2-8b |
name, checkpoint, quantization, init_timeout |
RoboBrain2 |
Embodied planning + multimodal reasoning via RoboBrain 2.0 | BAAI/RoboBrain2.0-7B |
name, checkpoint, init_timeout |
Whisper |
Multilingual speech-to-text (ASR) from OpenAI Whisper | small.en (checkpoint list) |
name, checkpoint, compute_type, init_timeout |
SpeechT5 |
Text-to-speech model from Microsoft SpeechT5 | microsoft/speecht5_tts |
name, checkpoint, voice, init_timeout |
Bark |
Text-to-speech model from SunoAI Bark | suno/bark-small, voice options |
name, checkpoint, voice, attn_implementation, init_timeout |
MeloTTS |
Multilingual text-to-speech via MeloTTS | EN, EN-US |
name, language, speaker_id, init_timeout |
VisionModel |
Detection + tracking via MMDetection | dino-4scale_r50_8xb2-12e_coco |
name, checkpoint, setup_trackers, cache_dir, tracking_distance_function, tracking_distance_threshold, deploy_tensorrt, _num_trackers, init_timeout |
RoboML has been tested on Ubuntu 20.04 and later. A GPU with CUDA 12.1+ is recommended. If you encounter issues, please open an issue.
pip install robomlgit clone https://github.com/automatika-robotics/roboml.git && cd roboml
virtualenv venv && source venv/bin/activate
pip install pip-tools
pip install .To use detection and tracking features via MMDetection:
-
Install RoboML with the vision extras:
pip install roboml[vision]
-
Install
mmcvusing the appropriate CUDA and PyTorch versions as described in their docs. Example for PyTorch 2.1 with CUDA 12.1:pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1/index.html
-
Install
mmdetection:git clone https://github.com/open-mmlab/mmdetection.git cd mmdetection pip install -v -e .
-
If
ffmpegorlibGLis missing:sudo apt-get update && apt-get install ffmpeg libsm6 libxext6
RoboML vision models can optionally be accelerated with NVIDIA TensorRT on Linux x86_64 systems. For setup, follow the TensorRT installation guide.
Jetson users are especially encouraged to use Docker.
- Install Docker Desktop
- Install the NVIDIA Container Toolkit
git clone https://github.com/automatika-robotics/roboml.git && cd roboml
# Build container image
docker build --tag=automatika:roboml .
# For Jetson boards:
docker build --tag=automatika:roboml -f Dockerfile.Jetson .
# Run HTTP server
docker run --runtime=nvidia --gpus all --rm -p 8000:8000 automatika:roboml roboml
# Or run RESP server
docker run --runtime=nvidia --gpus all --rm -p 6379:6379 automatika:roboml roboml-resp-
(Optional) Mount your cache dir to persist downloaded models:
-v ~/.cache:/root/.cache
RoboML uses Ray Serve to host models as scalable apps across various environments.
WebSocket endpoints are exposed for streaming use cases (e.g., STT/TTS).
For ultra-low latency in robotics, RoboML also includes a RESP-based server compatible with any Redis client.
RESP (see spec) is a lightweight, binary-safe protocol. Combined with msgpack instead of JSON, it enables very fast I/O, ideal for binary data like images, audio, or video.
This work is inspired by @hansonkd’s Tino project.
Run the HTTP server:
robomlRun the RESP server:
roboml-respExample usage in ROS clients is documented in ROS Agents.
Install dev dependencies:
pip install ".[dev]"Run tests from the project root:
python -m pytestUnless otherwise specified, all code is © 2024 Automatika Robotics. RoboML is released under the MIT License. See LICENSE for details.
ROS Agents is developed in collaboration between Automatika Robotics and Inria. Community contributions are welcome!