-
Notifications
You must be signed in to change notification settings - Fork 109
Open
Description
This program, when run on linux reports a time of 57ms, but when run on hermit takes 13s.
fn main() {
let t1 = std::time::Instant::now();
let threads: Vec<_> = (0..128)
.map(|tid| {
std::thread::spawn(|| {
for aid in 0..1_000_1000 {
Box::new([0u64; 8]);
}
})
})
.collect();
for t in threads {
t.join().unwrap();
}
dbg!(t1.elapsed());
}
I am measuring this on a processor with 16 cores, 32 threads. I am using qemu with -smp 32
.
I have looked into it a bit and found that the default global allocator forwards calls to the kernel allocator, which is Talc wrapped in a mutex, so poor multithreaded performance is to be expected. Given this huge slowdown, I think this should either not be the default allocator, or there should be some visibly placed guidance on how to obtain better allocation performance. Unfortunately, I could get neither jemalloc nor mimalloc to run on hermit out of the box.
Metadata
Metadata
Assignees
Labels
No labels