OOM in `GroupedHashAggregateStream::group_aggregate_batch()`

### Describe the bug

When attempting to accumulate large text fields with a `group by`, it was observed that `group_aggregate_batch()` can OOM despite ostensibly using the `MemoryPool`. 

Query:

```
select truncated_time, count(*) AS cnt
from (
    select
        truncated_time, k8s_deployment_name, message
    from (
        SELECT
            priorityclass,
            timestamp,
            date_trunc('day', timestamp) AS truncated_time,
            k8s_deployment_name,
            message
        FROM agg_oom
        where priorityclass != 'low'
    )
    group by truncated_time, k8s_deployment_name, message
) group by truncated_time
```

On 8x ~50MB parquet files where the `message` column can be up to 8192 byte strings. When profiled, by far it was the largest use of memory: 

![image](https://github.com/user-attachments/assets/8e2478f8-78ff-4850-8d37-5133f6f4579d)

When logging, we can see it fails while interning

```
converting 3 rows
interning 8192 rows with 1486954 bytes
interned 8192 rows, now I'm 13054176 bytes
resizing to 14103171
resizing to 14103171
reserving 28206342 extra bytes
converting 3 rows
interning 8192 rows with 1350859 bytes
memory allocation of 25690112 bytes failed
Aborted (core dumped)
```

### To Reproduce

1. set up a test with

```
    let memory_limit = 125_000_000;
    let MEMORY_FRACTION = 1.0;
    let rt_config = RuntimeConfig::new()
        .with_memory_limit(memory_limit, MEMORY_FRACTION);
```

2.set `ulimit -v 1152000`

3. query some parquet files with long strings

### Expected behavior

`group_aggregate_batch()` doesn't make the assumption:

```
            // Here we can ignore `insufficient_capacity_err` because we will spill later,
            // but at least one batch should fit in the memory
```

But instead realizes that adding 1 row to a million doesn't allocate 1,000,001, but rather 2,000,000 when the `Vec` exponentially resizes.

### Additional context

Proposed solution: 

Add 

```
            self.reservation.try_resize(self.reservation.size() * 2)?;
```

Above 

```
            self.group_values
                .intern(group_values, &mut self.current_group_indices)?;
```                

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OOM in `GroupedHashAggregateStream::group_aggregate_batch()` #13831

Describe the bug

To Reproduce

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OOM in GroupedHashAggregateStream::group_aggregate_batch() #13831

Description

Describe the bug

To Reproduce

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

OOM in `GroupedHashAggregateStream::group_aggregate_batch()` #13831