[BUG] on_new_message callback fires after first chunk received instead of before API request

### Basic checks

- [x] I searched existing issues - this hasn't been reported
- [x] I can reproduce this consistently
- [x] This is a RubyLLM bug, not my application code

### What's broken?

The [documentation](https://rubyllm.com/chat/#available-event-handlers) states that the `on_new_message`-callback is called *"just before the API request"*. However, this does not appear to be the case, but might not be noticeable unless you are using a reasoning model like GPT-5.

As a result, during the thinking/reasoning phase, we don't have an empty message to show the user a thinking state, and we don't want to create one in the controller, as it would be included in the messages sent to the LLM when calling `chat.complete` in the background job, as well as `chat.complete` would create another empty message when it receives the first chunk of data.

### How to reproduce

**Note:** This appears to only really be an issue for reasoning models like GPT-5, since they may be thinking for a while.

```rb
# In controller:

@chat = Chat.find(params[:chat_id])

# Create and persist the user message immediately
@chat.create_user_message(params[:content])

# Process AI response in background
ChatStreamJob.perform_later(@chat.id)
````
```rb
# In background job:

def perform(chat_id)
  chat = Chat.find(chat_id)

  chat.on_new_message do
    puts "Assistant is typing..."
  end

  # Expected to see "Assistant is typing..." in the console right after `chat.complete` executes
  chat.complete do |chunk|
    # GPT-5 has been thinking for 30 seconds. When the first chunk is received, I finally see "Assistant is typing..."
    assistant_message = chat.messages.last
    if chunk.content && assistant_message
      assistant_message.broadcast_append_chunk(chunk.content)
    end
  end
end
```

### Expected behavior

`chat.complete` calls the `on_new_message`-callback **before** the API request, resulting in a new empty message being created.

### What actually happened

`chat.complete` calls the `on_new_message`-callback **after** the first chunk has been received from the API request, potentially resulting in a very long delay (30 seconds+) before the message is created.

### Environment

- Ruby version: 3.4.5
- RubyLLM version: 1.6.4
- Provider: OpenAI
- Model: GPT-5
- OS: MacOS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BUG] on_new_message callback fires after first chunk received instead of before API request #367

Basic checks

What's broken?

How to reproduce

Expected behavior

What actually happened

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[BUG] on_new_message callback fires after first chunk received instead of before API request #367

Description

Basic checks

What's broken?

How to reproduce

Expected behavior

What actually happened

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions