-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
G1GC allocates humongous objects directly in the old generation to avoid unnecessary copies, which means that these allocations aren't garbage collected until a full GC runs. Humongous objects are objects that are 50% of the region size or more. Region size is at most 32MB (see the table for region size from heap size).
Parquet currently allocates a huge buffer for each contiguous group of column chunks, which in many cases is not garbage collected until a full GC. Adding a size limit for the allocation size should allow users to break row groups across multiple buffers so that buffers get collected when they have been read.
Reporter: Ryan Blue / @rdblue
Assignee: Ryan Blue / @rdblue
Related issues:
- Release Parquet Java 1.10 (blocks)
PRs and other links:
Note: This issue was originally created as PARQUET-787. Please see the migration documentation for further details.