Skip to content

Add video input file support #260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

arnodirlam
Copy link

@arnodirlam arnodirlam commented Jun 25, 2025

What this does

Enables RubyLLM to recognize and handle video files as attachments, in addition to existing support for images, audio, PDFs, and text.

Why?

  • Modern LLMs and AI applications increasingly support video input, making it important for RubyLLM to handle video files natively.
  • This enhancement improves the flexibility and completeness of RubyLLM for users who need to process or analyze video content.

Type of change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Performance improvement

Scope check

  • I read the Contributing Guide
  • This aligns with RubyLLM's focus on LLM communication
  • This isn't application-specific logic that belongs in user code
  • This benefits most users, not just my specific use case

Quality check

  • I ran overcommit --install and all hooks pass
  • I tested my changes thoroughly
  • I updated documentation if needed
  • I didn't modify auto-generated files manually (models.json, aliases.json)

API changes

  • Breaking change
  • New public methods/classes
  • Changed method signatures
  • No API changes

Related issues

Fixes #259

@arnodirlam
Copy link
Author

I've only tested this with Gemini 1.5 Flash and it works.

Updating the models and running the tests makes a lot of errors fail.

What's the best way to proceed from here?

Copy link
Owner

@crmne crmne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @arnodirlam I'd love to add this to RubyLLM. Do you have the means to test things out with API keys? There's one little thing to change too.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should rather be in chat_content_spec.rb.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 228ab17.

@crmne
Copy link
Owner

crmne commented Jul 16, 2025

Oh and check off the items that you did in the quality check section of the PR. All are required.

@crmne crmne marked this pull request as ready for review July 16, 2025 14:20
@crmne crmne added the enhancement New feature or request label Jul 16, 2025
@arnodirlam arnodirlam force-pushed the feature/video-file-support branch from c1bbf45 to 9604320 Compare July 16, 2025 16:38
@arnodirlam arnodirlam force-pushed the feature/video-file-support branch from dcd8183 to f8c4655 Compare July 18, 2025 14:14
@arnodirlam
Copy link
Author

arnodirlam commented Jul 18, 2025

I've addressed the remaining points now.

One thing left: vision is ambiguous now, because it sometimes implies image and video, but in reality only implies image. In fact, only Google Gemini models have video support.

Should I add methods supports_image? and supports_video? and soft-deprecate supports_vision??

EDIT: Just realized vision implies image and pdf in all cases, so vision is totally appropriate.

@arnodirlam
Copy link
Author

arnodirlam commented Jul 20, 2025

Added a supports_video? helper ad clarified some comments and docs. This should be good to merge now 👍

@arnodirlam arnodirlam changed the title Add video file support to attachments Add video input file support Jul 23, 2025
@arnodirlam arnodirlam requested a review from crmne July 23, 2025 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] video file support
2 participants