Skip to content

Conversation

@joadoumie
Copy link
Contributor

@joadoumie joadoumie commented Jul 16, 2025

This PR explores automatically adding the file context to the Jupyternaut Persona (for v3) when a file is attached to a message in a chat.

Here is a gif of the functionality (notice how Jupyternaut can read the content from the file).

dac22f4c-f839-4ba8-94ae-50c5c5ac892d

Related to #1299

This is my first PR to the project, so please let me know how I can improve anything to make the process smoother!

😊

- Add _process_attachments() method to read and include file content in chat context
- Add _resolve_attachment_to_path() to resolve attachment IDs to file paths
- Add _get_attachment_from_ychat() to retrieve attachment data from YDoc

This enables the Jupyternaut AI assistant to process file attachments in chat
messages and include their content as context for better responses.
@joadoumie joadoumie marked this pull request as ready for review July 16, 2025 12:50
@srdas srdas added the enhancement New feature or request label Jul 17, 2025
Copy link
Collaborator

@srdas srdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR, the functionality is very useful. Good to see that it works for multiple files in the same context query:
image

Also, nice error handling for non-supported files:
image

Comments:

  1. Suggest moving the new methods added by this PR to the base class in base_persona.py.
  2. Since it is working well, remove all non-essential DEBUG statements, and change DEBUG to FILE or some other tag?
    These changes should be straightforward.

What the PR does not do is take in a folder and nested subfolders and process all acceptable files within it as context and this would be a larger improvement and handled in a separate PR. I tried it as shown:
image

Appreciate the neat work on this PR! TY.

@joadoumie
Copy link
Contributor Author

joadoumie commented Jul 18, 2025

Thank you for the quick feedback! I've gone ahead and included the changes you suggested. I also ran mypy locally to make sure that the code changes pass the Typing test.

I didn't think to include folders (and nested subfolders), which is an interesting idea. This is probably not too difficult to implement, but I think that would balloon the context passed to the LLM pretty quickly. Maybe I will explore this further in another PR :)

@joadoumie joadoumie requested a review from srdas July 18, 2025 00:28
Copy link
Member

@dlqqq dlqqq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joadoumie Thank you for this contribution! 🤗 This PR looks great, just left a couple of suggestions below. Sanjiv had called out an issue with supporting directories, but that can be addressed later.

BTW, most likely, by v3.0, Jupyternaut will be an agent that is capable of reading files at a given path. This means that we will likely pass the path of the file & some metadata (e.g. file size) into the context instead of the entire file by then. Letting the agent decide how to read a file helps prevent token limit issues. 👍

@dlqqq
Copy link
Member

dlqqq commented Jul 18, 2025

I finished my review right before these changes were committed. My feedback should still be applicable, but let me know if I can help clarify anything. 😅

@joadoumie
Copy link
Contributor Author

Perfect! I'll take a look :) Thanks for reviewing!!!

@dlqqq
Copy link
Member

dlqqq commented Jul 18, 2025

I didn't think to include folders (and nested subfolders), which is an interesting idea. This is probably not too difficult to implement, but I think that would balloon the context passed to the LLM pretty quickly. Maybe I will explore this further in another PR :)

This is a great point, and this is exactly why we are seeking to provide Jupyternaut tools. Instead of "shoving all the context in", the agent should know how to retrieve a reasonable amount of context. Based on my observations, this is what Claude Code does. Small files get read directly, large files get pattern-searched via grep, and directories get listed before files are opened. A deeper observation is that this is also how humans sift through data on filesystems. 👀

However, this strategy is perfectly fine for a reasonable number of reasonably-sized files, so we can merge this once the feedback is addressed. 👍

If you're interested in contributing more, let me know so I can share some ideas on things to work on. The directory attachments feature will probably have to wait until tools are available to Jupyternaut.

@joadoumie
Copy link
Contributor Author

I've incorporated the feedback you gave to this PR. I'd love to keep contributing to the project, so please do let me know what ideas you have for things that I might be able to tackle as I familiarize myself with the codebase 😊

@joadoumie joadoumie requested a review from dlqqq July 18, 2025 22:22
Copy link
Collaborator

@srdas srdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased and retested this PR, all looks good. Nice work @joadoumie !!
@dlqqq I'll approve and you can merge after taking a look.

@dlqqq
Copy link
Member

dlqqq commented Jul 21, 2025

Will merge this PR once CI passes. I just pushed a commit that should hopefully fix the CI; Optional[str] must be used instead of str | None in Python 3.9 and below.

We should drop Python 3.9 support soon, but it should be fairly easy to fix it here.

@dlqqq
Copy link
Member

dlqqq commented Jul 21, 2025

Sweet! CI is passing (except for the pre-commit.ci job which is temporarily disabled). Merging. 👍

@dlqqq dlqqq merged commit 10b83cb into jupyterlab:main Jul 21, 2025
8 checks passed
@joadoumie
Copy link
Contributor Author

Thanks for the review and help on this one! Please let me know where else I may be able to contribute 😀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants