Skip to content

Add a size limit for heap allocations when reading #2019

@asfimport

Description

@asfimport

G1GC allocates humongous objects directly in the old generation to avoid unnecessary copies, which means that these allocations aren't garbage collected until a full GC runs. Humongous objects are objects that are 50% of the region size or more. Region size is at most 32MB (see the table for region size from heap size).

Parquet currently allocates a huge buffer for each contiguous group of column chunks, which in many cases is not garbage collected until a full GC. Adding a size limit for the allocation size should allow users to break row groups across multiple buffers so that buffers get collected when they have been read.

Reporter: Ryan Blue / @rdblue
Assignee: Ryan Blue / @rdblue

Related issues:

PRs and other links:

Note: This issue was originally created as PARQUET-787. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions