Update docs

arnodirlam · arnodirlam · commit c1bbf4571861 · 2025-06-25T17:19:49.000+02:00
diff --git a/README.md b/README.md
@@ -46,7 +46,7 @@ RubyLLM fixes all that. One beautiful API for everything. One consistent format.
 chat = RubyLLM.chat
 chat.ask "What's the best way to learn Ruby?"
 
-# Analyze images, audio, documents, and text files
+# Analyze images, videos, audio, documents, and text files
 chat.ask "What's in this image?", with: "ruby_conf.jpg"
 chat.ask "Describe this meeting", with: "meeting.wav"
 chat.ask "Summarize this document", with: "contract.pdf"
@@ -88,7 +88,7 @@ chat.with_tool(Weather).ask "What's the weather in Berlin? (52.5200, 13.4050)"
 ## Core Capabilities
 
 *   💬 **Unified Chat:** Converse with models from OpenAI, Anthropic, Gemini, Bedrock, OpenRouter, DeepSeek, Ollama, or any OpenAI-compatible API using `RubyLLM.chat`.
-*   👁️ **Vision:** Analyze images within chats.
+*   👁️ **Vision:** Analyze images and videos within chats.
 *   🔊 **Audio:** Transcribe and understand audio content.
 *   📄 **Document Analysis:** Extract information from PDFs, text files, and other documents.
 *   🖼️ **Image Generation:** Create images with `RubyLLM.paint`.
diff --git a/docs/guides/chat.md b/docs/guides/chat.md
@@ -119,7 +119,7 @@ RubyLLM manages a registry of known models and their capabilities. For detailed
 
 ## Multi-modal Conversations
 
-Modern AI models can often process more than just text. RubyLLM provides a unified way to include images, audio, text files, and PDFs in your chat messages using the `with:` option in the `ask` method.
+Modern AI models can often process more than just text. RubyLLM provides a unified way to include images, videos, audio, text files, and PDFs in your chat messages using the `with:` option in the `ask` method.
 
 ### Working with Images
 
@@ -144,6 +144,30 @@ puts response.content
 
 RubyLLM handles converting the image source into the format required by the specific provider API.
 
+### Working with Videos
+
+You can also analyze video files or URLs with vision-capable models. RubyLLM will automatically detect video files and handle them appropriately.
+
+```ruby
+# Ask about a local video file
+chat = RubyLLM.chat(model: 'gpt-4o')
+response = chat.ask "What happens in this video?", with: "path/to/demo.mp4"
+puts response.content
+
+# Ask about a video from a URL
+response = chat.ask "Summarize the main events in this video.", with: "https://example.com/demo_video.mp4"
+puts response.content
+
+# Combine videos with other file types
+response = chat.ask "Analyze these files for visual content.", with: ["diagram.png", "demo.mp4", "notes.txt"]
+puts response.content
+```
+
+**Notes:**
+- Supported video formats include .mp4, .mov, .avi, .webm, and others (provider-dependent).
+- Not all models or providers support video input; check the [Available Models Guide]({% link guides/available-models.md %}) for details.
+- Large video files may be subject to size or duration limits imposed by the provider.
+
 ### Working with Audio
 
 Provide audio file paths to audio-capable models (like `gpt-4o-audio-preview`).
@@ -224,6 +248,7 @@ response = chat.ask "What's in this image?", with: { image: "photo.jpg" }
 
 **Supported file types:**
 - **Images:** .jpg, .jpeg, .png, .gif, .webp, .bmp
+- **Videos:** .mp4, .mov, .avi, .webm
 - **Audio:** .mp3, .wav, .m4a, .ogg, .flac
 - **Documents:** .pdf, .txt, .md, .csv, .json, .xml
 - **Code:** .rb, .py, .js, .html, .css (and many others)
diff --git a/docs/guides/models.md b/docs/guides/models.md
@@ -41,7 +41,7 @@ The registry stores crucial information about each model, including:
 *   **`name`**: A human-friendly name.
 *   **`context_window`**: Max input tokens (e.g., `128_000`).
 *   **`max_tokens`**: Max output tokens (e.g., `16_384`).
-*   **`supports_vision`**: If it can process images.
+*   **`supports_vision`**: If it can process images and videos.
 *   **`supports_functions`**: If it can use [Tools]({% link guides/tools.md %}).
 *   **`input_price_per_million`**: Cost in USD per 1 million input tokens.
 *   **`output_price_per_million`**: Cost in USD per 1 million output tokens.
diff --git a/docs/guides/rails.md b/docs/guides/rails.md
@@ -117,7 +117,7 @@ Run the migrations: `rails db:migrate`
 
 ### ActiveStorage Setup for Attachments (Optional)
 
-If you want to use attachments (images, audio, PDFs) with your AI chats, you need to set up ActiveStorage:
+If you want to use attachments (images, videos, audio, PDFs) with your AI chats, you need to set up ActiveStorage:
 
 ```bash
 # Only needed if you plan to use attachments
@@ -291,7 +291,7 @@ chat_record.ask("Analyze this file", with: params[:uploaded_file])
 chat_record.ask("What's in this document?", with: user.profile_document)
 ```
 
-The attachment API automatically detects file types based on file extension or content type, so you don't need to specify whether something is an image, audio file, PDF, or text document - RubyLLM figures it out for you!
+The attachment API automatically detects file types based on file extension or content type, so you don't need to specify whether something is an image, video, audio file, PDF, or text document - RubyLLM figures it out for you!
 
 ## Handling Persistence Edge Cases
 
diff --git a/docs/index.md b/docs/index.md
@@ -72,8 +72,9 @@ RubyLLM fixes all that. One beautiful API for everything. One consistent format.
 chat = RubyLLM.chat
 chat.ask "What's the best way to learn Ruby?"
 
-# Analyze images, audio, documents, and text files
+# Analyze images, videos, audio, documents, and text files
 chat.ask "What's in this image?", with: "ruby_conf.jpg"
+chat.ask "What's happening in this video?", with: "presentation.mp4"
 chat.ask "Describe this meeting", with: "meeting.wav"
 chat.ask "Summarize this document", with: "contract.pdf"
 chat.ask "Explain this code", with: "app.rb"