Poor multi-threaded allocation performance

This program, when run on linux reports a time of 57ms, but when run on hermit takes 13s.
```
fn main() {
    let t1 = std::time::Instant::now();
    let threads: Vec<_> = (0..128)
        .map(|tid| {
            std::thread::spawn(|| {
                for aid in 0..1_000_1000 {
                    Box::new([0u64; 8]);
                }
            })
        })
        .collect();
    for t in threads {
        t.join().unwrap();
    }
    dbg!(t1.elapsed());
}
```

I am measuring this on a processor with 16 cores, 32 threads. I am using qemu with `-smp 32`.

I have looked into it a bit and found that the default global allocator forwards calls to the kernel allocator, which is Talc wrapped in a mutex, so poor multithreaded performance is to be expected. Given this huge slowdown, I think this should either not be the default allocator, or there should be some visibly placed guidance on how to obtain better allocation performance. Unfortunately, I could get neither jemalloc nor mimalloc to run on hermit out of the box.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Poor multi-threaded allocation performance #1984

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Poor multi-threaded allocation performance #1984

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions