Skip to content

Commit c1bbf45

Browse files
committed
Update docs
1 parent 4e7ac15 commit c1bbf45

File tree

5 files changed

+33
-7
lines changed

5 files changed

+33
-7
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ RubyLLM fixes all that. One beautiful API for everything. One consistent format.
4646
chat = RubyLLM.chat
4747
chat.ask "What's the best way to learn Ruby?"
4848

49-
# Analyze images, audio, documents, and text files
49+
# Analyze images, videos, audio, documents, and text files
5050
chat.ask "What's in this image?", with: "ruby_conf.jpg"
5151
chat.ask "Describe this meeting", with: "meeting.wav"
5252
chat.ask "Summarize this document", with: "contract.pdf"
@@ -88,7 +88,7 @@ chat.with_tool(Weather).ask "What's the weather in Berlin? (52.5200, 13.4050)"
8888
## Core Capabilities
8989

9090
* 💬 **Unified Chat:** Converse with models from OpenAI, Anthropic, Gemini, Bedrock, OpenRouter, DeepSeek, Ollama, or any OpenAI-compatible API using `RubyLLM.chat`.
91-
* 👁️ **Vision:** Analyze images within chats.
91+
* 👁️ **Vision:** Analyze images and videos within chats.
9292
* 🔊 **Audio:** Transcribe and understand audio content.
9393
* 📄 **Document Analysis:** Extract information from PDFs, text files, and other documents.
9494
* 🖼️ **Image Generation:** Create images with `RubyLLM.paint`.

docs/guides/chat.md

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ RubyLLM manages a registry of known models and their capabilities. For detailed
119119

120120
## Multi-modal Conversations
121121

122-
Modern AI models can often process more than just text. RubyLLM provides a unified way to include images, audio, text files, and PDFs in your chat messages using the `with:` option in the `ask` method.
122+
Modern AI models can often process more than just text. RubyLLM provides a unified way to include images, videos, audio, text files, and PDFs in your chat messages using the `with:` option in the `ask` method.
123123

124124
### Working with Images
125125

@@ -144,6 +144,30 @@ puts response.content
144144

145145
RubyLLM handles converting the image source into the format required by the specific provider API.
146146

147+
### Working with Videos
148+
149+
You can also analyze video files or URLs with vision-capable models. RubyLLM will automatically detect video files and handle them appropriately.
150+
151+
```ruby
152+
# Ask about a local video file
153+
chat = RubyLLM.chat(model: 'gpt-4o')
154+
response = chat.ask "What happens in this video?", with: "path/to/demo.mp4"
155+
puts response.content
156+
157+
# Ask about a video from a URL
158+
response = chat.ask "Summarize the main events in this video.", with: "https://example.com/demo_video.mp4"
159+
puts response.content
160+
161+
# Combine videos with other file types
162+
response = chat.ask "Analyze these files for visual content.", with: ["diagram.png", "demo.mp4", "notes.txt"]
163+
puts response.content
164+
```
165+
166+
**Notes:**
167+
- Supported video formats include .mp4, .mov, .avi, .webm, and others (provider-dependent).
168+
- Not all models or providers support video input; check the [Available Models Guide]({% link guides/available-models.md %}) for details.
169+
- Large video files may be subject to size or duration limits imposed by the provider.
170+
147171
### Working with Audio
148172

149173
Provide audio file paths to audio-capable models (like `gpt-4o-audio-preview`).
@@ -224,6 +248,7 @@ response = chat.ask "What's in this image?", with: { image: "photo.jpg" }
224248

225249
**Supported file types:**
226250
- **Images:** .jpg, .jpeg, .png, .gif, .webp, .bmp
251+
- **Videos:** .mp4, .mov, .avi, .webm
227252
- **Audio:** .mp3, .wav, .m4a, .ogg, .flac
228253
- **Documents:** .pdf, .txt, .md, .csv, .json, .xml
229254
- **Code:** .rb, .py, .js, .html, .css (and many others)

docs/guides/models.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ The registry stores crucial information about each model, including:
4141
* **`name`**: A human-friendly name.
4242
* **`context_window`**: Max input tokens (e.g., `128_000`).
4343
* **`max_tokens`**: Max output tokens (e.g., `16_384`).
44-
* **`supports_vision`**: If it can process images.
44+
* **`supports_vision`**: If it can process images and videos.
4545
* **`supports_functions`**: If it can use [Tools]({% link guides/tools.md %}).
4646
* **`input_price_per_million`**: Cost in USD per 1 million input tokens.
4747
* **`output_price_per_million`**: Cost in USD per 1 million output tokens.

docs/guides/rails.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ Run the migrations: `rails db:migrate`
117117

118118
### ActiveStorage Setup for Attachments (Optional)
119119

120-
If you want to use attachments (images, audio, PDFs) with your AI chats, you need to set up ActiveStorage:
120+
If you want to use attachments (images, videos, audio, PDFs) with your AI chats, you need to set up ActiveStorage:
121121

122122
```bash
123123
# Only needed if you plan to use attachments
@@ -291,7 +291,7 @@ chat_record.ask("Analyze this file", with: params[:uploaded_file])
291291
chat_record.ask("What's in this document?", with: user.profile_document)
292292
```
293293

294-
The attachment API automatically detects file types based on file extension or content type, so you don't need to specify whether something is an image, audio file, PDF, or text document - RubyLLM figures it out for you!
294+
The attachment API automatically detects file types based on file extension or content type, so you don't need to specify whether something is an image, video, audio file, PDF, or text document - RubyLLM figures it out for you!
295295

296296
## Handling Persistence Edge Cases
297297

docs/index.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,8 +72,9 @@ RubyLLM fixes all that. One beautiful API for everything. One consistent format.
7272
chat = RubyLLM.chat
7373
chat.ask "What's the best way to learn Ruby?"
7474

75-
# Analyze images, audio, documents, and text files
75+
# Analyze images, videos, audio, documents, and text files
7676
chat.ask "What's in this image?", with: "ruby_conf.jpg"
77+
chat.ask "What's happening in this video?", with: "presentation.mp4"
7778
chat.ask "Describe this meeting", with: "meeting.wav"
7879
chat.ask "Summarize this document", with: "contract.pdf"
7980
chat.ask "Explain this code", with: "app.rb"

0 commit comments

Comments
 (0)