Pre-request check for request_tokens_limit exceeded

### Initial Checks

- [x] I confirm that I'm using the latest version of Pydantic AI
- [x] I confirm that I searched for my issue in https://github.com/pydantic/pydantic-ai/issues before opening this issue

### Description

A request_tokens_limit check happens **after** the model returns a response, by which time the tokens have already been used and a response has already been produced, but the UsageLimitExceeded exception prevents the response from returning.

Ideally this would happen before the request is made. Which would require estimating tokens on the client side, eg using [tiktoken](https://cookbook.openai.com/examples/how_to_count_tokens_with_tiktoken) for openai models or similar.

If this pre-request check is deemed infeasible perhaps the response can be returned nonetheless, since it has been produced, and only subsequent requests raise an exception.

eg:

![Image](https://github.com/user-attachments/assets/61d0bc83-d8c3-4aae-953d-f8879ab0fc66)

```
Traceback (most recent call last):
  File "/Users/mcbob/.venv/lib/python3.11/site-packages/opentelemetry/trace/__init__.py", line 587, in use_span
    yield span
  File "/Users/mcbob/.venv/lib/python3.11/site-packages/pydantic_graph/graph.py", line 261, in iter
    yield GraphRun[StateT, DepsT, RunEndT](
  File "/Users/mcbob/.venv/lib/python3.11/site-packages/pydantic_ai/agent.py", line 683, in iter
    yield agent_run
  File "/Users/mcbob/.venv/lib/python3.11/site-packages/pydantic_ai/agent.py", line 451, in run
    async for _ in agent_run:
  File "/Users/mcbob/.venv/lib/python3.11/site-packages/pydantic_ai/agent.py", line 1798, in __anext__
    next_node = await self._graph_run.__anext__()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mcbob/.venv/lib/python3.11/site-packages/pydantic_graph/graph.py", line 810, in __anext__
    return await self.next(self._next_node)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mcbob/.venv/lib/python3.11/site-packages/pydantic_graph/graph.py", line 783, in next
    self._next_node = await node.run(ctx)
                      ^^^^^^^^^^^^^^^^^^^
  File "/Users/mcbob/.venv/lib/python3.11/site-packages/pydantic_ai/_agent_graph.py", line 270, in run
    return await self._make_request(ctx)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mcbob/.venv/lib/python3.11/site-packages/pydantic_ai/_agent_graph.py", line 329, in _make_request
    return self._finish_handling(ctx, model_response, request_usage)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mcbob/.venv/lib/python3.11/site-packages/pydantic_ai/_agent_graph.py", line 356, in _finish_handling
    ctx.deps.usage_limits.check_tokens(ctx.state.usage)
  File "/Users/mcbob/.venv/lib/python3.11/site-packages/pydantic_ai/usage.py", line 112, in check_tokens
    raise UsageLimitExceeded(
pydantic_ai.exceptions.UsageLimitExceeded: Exceeded the request_tokens_limit of 5000 (request_tokens=11725)
```

### Example Code

```Python
agent_response = await self.agent.run(
                    user_prompt=last_user_message,
                    message_history=history,
                    usage_limits=UsageLimits(request_tokens_limit=5000),
                )
```

### Python, Pydantic AI & LLM client version

```Text
0.2.6
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pre-request check for request_tokens_limit exceeded #1794

Initial Checks

Description

Example Code

Python, Pydantic AI & LLM client version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pre-request check for request_tokens_limit exceeded #1794

Description

Initial Checks

Description

Example Code

Python, Pydantic AI & LLM client version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions