Skip to content

Pre-request check for request_tokens_limit exceeded #1794

@tekumara

Description

@tekumara

Initial Checks

Description

A request_tokens_limit check happens after the model returns a response, by which time the tokens have already been used and a response has already been produced, but the UsageLimitExceeded exception prevents the response from returning.

Ideally this would happen before the request is made. Which would require estimating tokens on the client side, eg using tiktoken for openai models or similar.

If this pre-request check is deemed infeasible perhaps the response can be returned nonetheless, since it has been produced, and only subsequent requests raise an exception.

eg:

Image

Traceback (most recent call last):
  File "/Users/mcbob/.venv/lib/python3.11/site-packages/opentelemetry/trace/__init__.py", line 587, in use_span
    yield span
  File "/Users/mcbob/.venv/lib/python3.11/site-packages/pydantic_graph/graph.py", line 261, in iter
    yield GraphRun[StateT, DepsT, RunEndT](
  File "/Users/mcbob/.venv/lib/python3.11/site-packages/pydantic_ai/agent.py", line 683, in iter
    yield agent_run
  File "/Users/mcbob/.venv/lib/python3.11/site-packages/pydantic_ai/agent.py", line 451, in run
    async for _ in agent_run:
  File "/Users/mcbob/.venv/lib/python3.11/site-packages/pydantic_ai/agent.py", line 1798, in __anext__
    next_node = await self._graph_run.__anext__()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mcbob/.venv/lib/python3.11/site-packages/pydantic_graph/graph.py", line 810, in __anext__
    return await self.next(self._next_node)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mcbob/.venv/lib/python3.11/site-packages/pydantic_graph/graph.py", line 783, in next
    self._next_node = await node.run(ctx)
                      ^^^^^^^^^^^^^^^^^^^
  File "/Users/mcbob/.venv/lib/python3.11/site-packages/pydantic_ai/_agent_graph.py", line 270, in run
    return await self._make_request(ctx)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mcbob/.venv/lib/python3.11/site-packages/pydantic_ai/_agent_graph.py", line 329, in _make_request
    return self._finish_handling(ctx, model_response, request_usage)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mcbob/.venv/lib/python3.11/site-packages/pydantic_ai/_agent_graph.py", line 356, in _finish_handling
    ctx.deps.usage_limits.check_tokens(ctx.state.usage)
  File "/Users/mcbob/.venv/lib/python3.11/site-packages/pydantic_ai/usage.py", line 112, in check_tokens
    raise UsageLimitExceeded(
pydantic_ai.exceptions.UsageLimitExceeded: Exceeded the request_tokens_limit of 5000 (request_tokens=11725)

Example Code

agent_response = await self.agent.run(
                    user_prompt=last_user_message,
                    message_history=history,
                    usage_limits=UsageLimits(request_tokens_limit=5000),
                )

Python, Pydantic AI & LLM client version

0.2.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions