-
Notifications
You must be signed in to change notification settings - Fork 58
feat: improve actor tool output #260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
expectToolNamesToContain(names, DEFAULT_TOOL_NAMES); | ||
expectToolNamesToContain(names, DEFAULT_ACTOR_NAMES); | ||
expect(names).toContain('get-actor-output'); | ||
await client.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all await client.close()
are not needed, because of afterEach
on line 90.
AfterEach is safe, it's called if any test failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just put it there, just in case :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not removed 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not remove it 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep that, it should not cause any issues, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manually closing is required only when more clients will be in a single test.
Every client will be closed by
afterEach(async () => {
await client?.close();
client = undefined;
});
Even if the test failed, afterEach is always run, and the client will be closed and disconnected. Without this afterEach, each test should use try..finally to close client after.
It was a bug before. When one test failed, the client was not disconnected, and afterAll failed on timeout because the server was waiting for the client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works great — I played with it, and so far it’s flawless 🙂
My biggest concern is that get-actor-output shows up without users explicitly knowing it.
For example, when you configure tools, you end up with one extra tool as a kind of “bonus.” I think get-actor-output should go into the actors category. It makes sense there, typically when you call an Actor, you also need to retrieve its output.
What do you think?
getDataset, | ||
getDatasetItems, | ||
getDatasetSchema, | ||
getActorOutput, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that logically, getActorOutput should be placed here BUT it might be really misleading for user. You will enable tools=actors and you will get an extra bonus of getActorOutput without any obvious reason. What about to add it into actors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! This should not be here, I would personally not add this tool to any category and kept it as "magical" tool that appears when you include any tool that can interact with Actors in any way (see README change that explains this). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the fewer surprises, the better. When a user specifies something, we shouldn’t magically include something else. That feels like a big surprise and just asks for trouble — what if the user doesn’t want it at all? And how would they switch it off? There’s no way…
Instead, I’d add it to the actors category (alongside call-actor), and maybe also to other categories if needed.
But, t should also be included whenever an Actor is added 🤔, ahh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would personally keep this as a "magical" tool since user may not know when to add this tool and then might be frustrated when he only gets partial or no Actor results (we encountered user issues and reports like this already). I understand that now it seems more "magical" and is kind of a surprise but it makes the whole Actor calling and output retrieval work and this is crucial for the workflows. So I would not put this tool into any category and keep it hidden and auto-enabled if needed (based on the tool loader logic). I think there is no reason to Actually switch off this tool if you are working with Actors. Maybe there is, when you want to only call some Actor like send email and you do not care about the results, but that is miniory of the workflows and I would not complicate tool configuration for majorty only to satisfy this one use case where user might want to turn this tool off.
commit 20e6753 Author: Apify Release Bot <[email protected]> Date: Tue Sep 9 11:41:31 2025 +0000 chore(release): Update changelog, package.json and manifest.json versions [skip ci] commit 7ef726d Author: Jakub Kopecký <[email protected]> Date: Tue Sep 9 13:40:04 2025 +0200 feat: improve actor tool output (#260) * feat: improve actor tool output * update readme * fix output tool, write test for that * add test based on Zuzka suggestion * lint * fix output response order so LLM does not lose the instructions * refactor: unify string list parsing logic * fix the tests - order of the Actor run response messages * Update src/utils/schema-generation.ts Co-authored-by: Michal Kalita <[email protected]> * address review comments * add get-actor-output tools note about when its loaded --------- Co-authored-by: Michal Kalita <[email protected]> commit 279293f Author: Michal Kalita <[email protected]> Date: Mon Sep 8 12:05:25 2025 +0200 fix: error when content type is json (#265) * fix: error when content type is json * fix: do not make json schema formatted for human readable commit 4659e03 Author: Apify Release Bot <[email protected]> Date: Thu Sep 4 12:30:50 2025 +0000 chore(release): Update changelog, package.json and manifest.json versions [skip ci]
commit 20e6753 Author: Apify Release Bot <[email protected]> Date: Tue Sep 9 11:41:31 2025 +0000 chore(release): Update changelog, package.json and manifest.json versions [skip ci] commit 7ef726d Author: Jakub Kopecký <[email protected]> Date: Tue Sep 9 13:40:04 2025 +0200 feat: improve actor tool output (#260) * feat: improve actor tool output * update readme * fix output tool, write test for that * add test based on Zuzka suggestion * lint * fix output response order so LLM does not lose the instructions * refactor: unify string list parsing logic * fix the tests - order of the Actor run response messages * Update src/utils/schema-generation.ts Co-authored-by: Michal Kalita <[email protected]> * address review comments * add get-actor-output tools note about when its loaded --------- Co-authored-by: Michal Kalita <[email protected]> commit 279293f Author: Michal Kalita <[email protected]> Date: Mon Sep 8 12:05:25 2025 +0200 fix: error when content type is json (#265) * fix: error when content type is json * fix: do not make json schema formatted for human readable commit 4659e03 Author: Apify Release Bot <[email protected]> Date: Thu Sep 4 12:30:50 2025 +0000 chore(release): Update changelog, package.json and manifest.json versions [skip ci]
* feat: improve actor tool output * update readme * fix output tool, write test for that * add test based on Zuzka suggestion * lint * fix output response order so LLM does not lose the instructions * refactor: unify string list parsing logic * fix the tests - order of the Actor run response messages * Update src/utils/schema-generation.ts Co-authored-by: Michal Kalita <[email protected]> * address review comments * feat: agentci payments v2 * add skyfire usage resource, fix skyfire pay id handling and passing to Actor * add skyfire instructions also to the call-actor info step result content * Squashed commit of the following: commit 20e6753 Author: Apify Release Bot <[email protected]> Date: Tue Sep 9 11:41:31 2025 +0000 chore(release): Update changelog, package.json and manifest.json versions [skip ci] commit 7ef726d Author: Jakub Kopecký <[email protected]> Date: Tue Sep 9 13:40:04 2025 +0200 feat: improve actor tool output (#260) * feat: improve actor tool output * update readme * fix output tool, write test for that * add test based on Zuzka suggestion * lint * fix output response order so LLM does not lose the instructions * refactor: unify string list parsing logic * fix the tests - order of the Actor run response messages * Update src/utils/schema-generation.ts Co-authored-by: Michal Kalita <[email protected]> * address review comments * add get-actor-output tools note about when its loaded --------- Co-authored-by: Michal Kalita <[email protected]> commit 279293f Author: Michal Kalita <[email protected]> Date: Mon Sep 8 12:05:25 2025 +0200 fix: error when content type is json (#265) * fix: error when content type is json * fix: do not make json schema formatted for human readable commit 4659e03 Author: Apify Release Bot <[email protected]> Date: Thu Sep 4 12:30:50 2025 +0000 chore(release): Update changelog, package.json and manifest.json versions [skip ci] * fix the port already in use issue with tests * lint * remove the try catch that was rethrowing generic error in callActorGetDataset * add skyfire seller id and pay id to the get actor output * lint * rename env var * include seller id conditionally in the instructions * handle skyfire pay id in get actor output * change usd amount --------- Co-authored-by: Michal Kalita <[email protected]>
closes https://github.com/apify/apify-mcp-server-internal/issues/147
Backstory
This PR rework how the MCP server handles the Actor tool output. Instead of dumping the whole dataset with filtered only the important fields which sometimes may cause LLM context or tool call limit overflow here we try to return feasible preview of the Actor output items and then provide a
get-actor-output
tool for output dataset items retrieval.Details
The dataset items preview first tries to return the whole dataset with all items, if that does not fit into the 20k tokens limit (set to 20k to keep it conservative as some client put hard limit of 25k so there is some padding) then we include only important fields (fields, that are defined in some dataset view) and if that does not fit either then we try to truncate the items with important fields only one by one until we find number of important fields items that fit (in extreme case it can also retur no preview if the Actor has really bad and nested output format).
The
get-actor-output
tool is loaded whenever any tool that even indirectly interacts with Actors is present (call-actor
,add-actor
, Actor tool directly likeapify-slash-rag-web-browser
).Changes
runId
,datasetId
, total items count), then the output items schema, if possible some data preview and then instructions how to retrieve additional data or specific fields.get-actor-output
tools as I found theget-dataset-items
complicated and wanted to keep this as original dataset centric advanced tooladditionalProperties
input from theget-dataset-schema
as it seemed useless @jirispilka correct me if I am wrongremove-tool
andhelper-tool
code