-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
It seems like the Top-K query optimization is somehow conditional on the usage of a custom allocator (mimalloc/snmalloc), while in principle that shouldn't be the case?
To Reproduce
- Grab and build bytehound: https://github.com/koute/bytehound
- Prepare some large-ish Parquet file, e.g.
https://seafowl-public.s3.eu-west-1.amazonaws.com/tutorial/trase-supply-chains.parquet:
$ du -h ~/supply-chains.parquet
146M /home/ubuntu/supply-chains.parquet
- Remove the custom allocator and build
diff --git a/datafusion-cli/src/main.rs b/datafusion-cli/src/main.rs
index aea499d60..a92957730 100644
--- a/datafusion-cli/src/main.rs
+++ b/datafusion-cli/src/main.rs
@@ -24,13 +24,13 @@ use datafusion_cli::catalog::DynamicFileCatalog;
use datafusion_cli::{
exec, print_format::PrintFormat, print_options::PrintOptions, DATAFUSION_CLI_VERSION,
};
-use mimalloc::MiMalloc;
+// use mimalloc::MiMalloc;
use std::env;
use std::path::Path;
use std::sync::Arc;
-#[global_allocator]
-static GLOBAL: MiMalloc = MiMalloc;
+// #[global_allocator]
+// static GLOBAL: MiMalloc = MiMalloc;
#[derive(Debug, Parser, PartialEq)]
#[clap(author, version, about, long_about= None)]- Profile a Top-K query
$ LD_PRELOAD=~/bytehound/target/release/libbytehound.so ./target/debug/datafusion-cli
DataFusion CLI v28.0.0
❯ CREATE EXTERNAL TABLE supply_chains STORED AS PARQUET LOCATION '/home/ubuntu/supply-chains.parquet';
0 rows in set. Query took 0.445 seconds.
❯ SELECT * FROM supply_chains ORDER BY flow_id DESC LIMIT 1;
...Expected behavior
With the custom allocator present the memory profile I see is like this

Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working
