Skip to content

Conversation

MQ37
Copy link
Contributor

@MQ37 MQ37 commented Sep 3, 2025

closes https://github.com/apify/apify-mcp-server-internal/issues/147

Backstory

This PR rework how the MCP server handles the Actor tool output. Instead of dumping the whole dataset with filtered only the important fields which sometimes may cause LLM context or tool call limit overflow here we try to return feasible preview of the Actor output items and then provide a get-actor-output tool for output dataset items retrieval.

Details

The dataset items preview first tries to return the whole dataset with all items, if that does not fit into the 20k tokens limit (set to 20k to keep it conservative as some client put hard limit of 25k so there is some padding) then we include only important fields (fields, that are defined in some dataset view) and if that does not fit either then we try to truncate the items with important fields only one by one until we find number of important fields items that fit (in extreme case it can also retur no preview if the Actor has really bad and nested output format).

The get-actor-output tool is loaded whenever any tool that even indirectly interacts with Actors is present (call-actor, add-actor, Actor tool directly like apify-slash-rag-web-browser).

Changes

  • Reworked and centralized the Actor tool output format. Now it returns metadata (runId, datasetId, total items count), then the output items schema, if possible some data preview and then instructions how to retrieve additional data or specific fields.
  • Introduced new simple get-actor-output tools as I found the get-dataset-items complicated and wanted to keep this as original dataset centric advanced tool
  • Centralized dataset items schema generation logic and removed the additionalProperties input from the get-dataset-schema as it seemed useless @jirispilka correct me if I am wrong
  • Deleted the legacy remove-tool and helper-tool code
  • Refactored all tool to use the central Ajv instance
  • Centralized string comma-separated list parsing logic

@github-actions github-actions bot added t-ai Issues owned by the AI team. tested Temporary label used only programatically for some analytics. labels Sep 3, 2025
@MQ37 MQ37 requested a review from MichalKalita September 4, 2025 09:08
expectToolNamesToContain(names, DEFAULT_TOOL_NAMES);
expectToolNamesToContain(names, DEFAULT_ACTOR_NAMES);
expect(names).toContain('get-actor-output');
await client.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all await client.close() are not needed, because of afterEach on line 90.
AfterEach is safe, it's called if any test failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just put it there, just in case :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not removed 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not remove it 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep that, it should not cause any issues, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manually closing is required only when more clients will be in a single test.

Every client will be closed by

afterEach(async () => {
            await client?.close();
            client = undefined;
        });

Even if the test failed, afterEach is always run, and the client will be closed and disconnected. Without this afterEach, each test should use try..finally to close client after.

It was a bug before. When one test failed, the client was not disconnected, and afterAll failed on timeout because the server was waiting for the client.

@MQ37 MQ37 marked this pull request as ready for review September 5, 2025 08:32
@MQ37 MQ37 requested a review from jirispilka September 5, 2025 08:32
Copy link
Collaborator

@jirispilka jirispilka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works great — I played with it, and so far it’s flawless 🙂

My biggest concern is that get-actor-output shows up without users explicitly knowing it.

For example, when you configure tools, you end up with one extra tool as a kind of “bonus.” I think get-actor-output should go into the actors category. It makes sense there, typically when you call an Actor, you also need to retrieve its output.

What do you think?

getDataset,
getDatasetItems,
getDatasetSchema,
getActorOutput,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that logically, getActorOutput should be placed here BUT it might be really misleading for user. You will enable tools=actors and you will get an extra bonus of getActorOutput without any obvious reason. What about to add it into actors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! This should not be here, I would personally not add this tool to any category and kept it as "magical" tool that appears when you include any tool that can interact with Actors in any way (see README change that explains this). What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fewer surprises, the better. When a user specifies something, we shouldn’t magically include something else. That feels like a big surprise and just asks for trouble — what if the user doesn’t want it at all? And how would they switch it off? There’s no way…

Instead, I’d add it to the actors category (alongside call-actor), and maybe also to other categories if needed.

But, t should also be included whenever an Actor is added 🤔, ahh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally keep this as a "magical" tool since user may not know when to add this tool and then might be frustrated when he only gets partial or no Actor results (we encountered user issues and reports like this already). I understand that now it seems more "magical" and is kind of a surprise but it makes the whole Actor calling and output retrieval work and this is crucial for the workflows. So I would not put this tool into any category and keep it hidden and auto-enabled if needed (based on the tool loader logic). I think there is no reason to Actually switch off this tool if you are working with Actors. Maybe there is, when you want to only call some Actor like send email and you do not care about the results, but that is miniory of the workflows and I would not complicate tool configuration for majorty only to satisfy this one use case where user might want to turn this tool off.

@MQ37 MQ37 requested a review from jirispilka September 8, 2025 11:04
@MQ37 MQ37 merged commit 7ef726d into master Sep 9, 2025
4 checks passed
@MQ37 MQ37 deleted the feat/improve-actor-tool-output branch September 9, 2025 11:40
MQ37 added a commit that referenced this pull request Sep 9, 2025
commit 20e6753
Author: Apify Release Bot <[email protected]>
Date:   Tue Sep 9 11:41:31 2025 +0000

    chore(release): Update changelog, package.json and manifest.json versions [skip ci]

commit 7ef726d
Author: Jakub Kopecký <[email protected]>
Date:   Tue Sep 9 13:40:04 2025 +0200

    feat: improve actor tool output (#260)

    * feat: improve actor tool output

    * update readme

    * fix output tool, write test for that

    * add test based on Zuzka suggestion

    * lint

    * fix output response order so LLM does not lose the instructions

    * refactor: unify string list parsing logic

    * fix the tests - order of the Actor run response messages

    * Update src/utils/schema-generation.ts

    Co-authored-by: Michal Kalita <[email protected]>

    * address review comments

    * add get-actor-output tools note about when its loaded

    ---------

    Co-authored-by: Michal Kalita <[email protected]>

commit 279293f
Author: Michal Kalita <[email protected]>
Date:   Mon Sep 8 12:05:25 2025 +0200

    fix: error when content type is json (#265)

    * fix: error when content type is json

    * fix: do not make json schema formatted for human readable

commit 4659e03
Author: Apify Release Bot <[email protected]>
Date:   Thu Sep 4 12:30:50 2025 +0000

    chore(release): Update changelog, package.json and manifest.json versions [skip ci]
MQ37 added a commit that referenced this pull request Sep 9, 2025
commit 20e6753
Author: Apify Release Bot <[email protected]>
Date:   Tue Sep 9 11:41:31 2025 +0000

    chore(release): Update changelog, package.json and manifest.json versions [skip ci]

commit 7ef726d
Author: Jakub Kopecký <[email protected]>
Date:   Tue Sep 9 13:40:04 2025 +0200

    feat: improve actor tool output (#260)

    * feat: improve actor tool output

    * update readme

    * fix output tool, write test for that

    * add test based on Zuzka suggestion

    * lint

    * fix output response order so LLM does not lose the instructions

    * refactor: unify string list parsing logic

    * fix the tests - order of the Actor run response messages

    * Update src/utils/schema-generation.ts

    Co-authored-by: Michal Kalita <[email protected]>

    * address review comments

    * add get-actor-output tools note about when its loaded

    ---------

    Co-authored-by: Michal Kalita <[email protected]>

commit 279293f
Author: Michal Kalita <[email protected]>
Date:   Mon Sep 8 12:05:25 2025 +0200

    fix: error when content type is json (#265)

    * fix: error when content type is json

    * fix: do not make json schema formatted for human readable

commit 4659e03
Author: Apify Release Bot <[email protected]>
Date:   Thu Sep 4 12:30:50 2025 +0000

    chore(release): Update changelog, package.json and manifest.json versions [skip ci]
MQ37 added a commit that referenced this pull request Sep 12, 2025
* feat: improve actor tool output

* update readme

* fix output tool, write test for that

* add test based on Zuzka suggestion

* lint

* fix output response order so LLM does not lose the instructions

* refactor: unify string list parsing logic

* fix the tests - order of the Actor run response messages

* Update src/utils/schema-generation.ts

Co-authored-by: Michal Kalita <[email protected]>

* address review comments

* feat: agentci payments v2

* add skyfire usage resource, fix skyfire pay id handling and passing to Actor

* add skyfire instructions also to the call-actor info step result content

* Squashed commit of the following:

commit 20e6753
Author: Apify Release Bot <[email protected]>
Date:   Tue Sep 9 11:41:31 2025 +0000

    chore(release): Update changelog, package.json and manifest.json versions [skip ci]

commit 7ef726d
Author: Jakub Kopecký <[email protected]>
Date:   Tue Sep 9 13:40:04 2025 +0200

    feat: improve actor tool output (#260)

    * feat: improve actor tool output

    * update readme

    * fix output tool, write test for that

    * add test based on Zuzka suggestion

    * lint

    * fix output response order so LLM does not lose the instructions

    * refactor: unify string list parsing logic

    * fix the tests - order of the Actor run response messages

    * Update src/utils/schema-generation.ts

    Co-authored-by: Michal Kalita <[email protected]>

    * address review comments

    * add get-actor-output tools note about when its loaded

    ---------

    Co-authored-by: Michal Kalita <[email protected]>

commit 279293f
Author: Michal Kalita <[email protected]>
Date:   Mon Sep 8 12:05:25 2025 +0200

    fix: error when content type is json (#265)

    * fix: error when content type is json

    * fix: do not make json schema formatted for human readable

commit 4659e03
Author: Apify Release Bot <[email protected]>
Date:   Thu Sep 4 12:30:50 2025 +0000

    chore(release): Update changelog, package.json and manifest.json versions [skip ci]

* fix the port already in use issue with tests

* lint

* remove the try catch that was rethrowing generic error in callActorGetDataset

* add skyfire seller id  and pay id to the get actor output

* lint

* rename env var

* include seller id conditionally in the instructions

* handle skyfire pay id in get actor output

* change usd amount

---------

Co-authored-by: Michal Kalita <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-ai Issues owned by the AI team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants