Skip to content

Poor multi-threaded allocation performance #1984

@m-mueller678

Description

@m-mueller678

This program, when run on linux reports a time of 57ms, but when run on hermit takes 13s.

fn main() {
    let t1 = std::time::Instant::now();
    let threads: Vec<_> = (0..128)
        .map(|tid| {
            std::thread::spawn(|| {
                for aid in 0..1_000_1000 {
                    Box::new([0u64; 8]);
                }
            })
        })
        .collect();
    for t in threads {
        t.join().unwrap();
    }
    dbg!(t1.elapsed());
}

I am measuring this on a processor with 16 cores, 32 threads. I am using qemu with -smp 32.

I have looked into it a bit and found that the default global allocator forwards calls to the kernel allocator, which is Talc wrapped in a mutex, so poor multithreaded performance is to be expected. Given this huge slowdown, I think this should either not be the default allocator, or there should be some visibly placed guidance on how to obtain better allocation performance. Unfortunately, I could get neither jemalloc nor mimalloc to run on hermit out of the box.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions