-
-
Notifications
You must be signed in to change notification settings - Fork 153
Closed
Description
We explored the option of making the Parquet size configurable by making the upload interval configurable using the env P_STORAGE_UPLOAD_INTERVAL
, but the problem with this approach is manifold.
- The data in staging area is now variable and can span hours or days. This makes the query code very complicated.
- There is not much benefit on the parquet size, because it is difficult to predict the volume of logs.
A better approach would be to add a separate compaction engine that can compact and create more compressed parquet files for historical data. We'll take that up in a separate exercise. For now we need to revert the changes in #616 and also remove the P_STORAGE_UPLOAD_INTERVAL
option completely.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request