Skip to content

When using multiple benchmarks earlier ones affect the ones coming later #166

@harendra-kumar

Description

@harendra-kumar

I have the following benchmarks in a group:

        bgroup "map"
          [ bench "machines" $ whnf drainM (M.mapping (+1))
          , bench "streaming" $ whnf drainS (S.map (+1))
          , bench "pipes" $ whnf drainP (P.map (+1))
          , bench "conduit" $ whnf drainC (C.map (+1))
          , bench "list-transformer" $ whnf drainL (lift . return . (+1))
          ]

The last two benchmarks take significantly more time when I run all these benchmarks in one go using stack bench --benchmark-arguments "-m glob ops/map/*".

$ stack bench --benchmark-arguments "-m glob ops/map/*"

benchmarking ops/map/machines
time                 30.23 ms   (29.22 ms .. 31.04 ms)

benchmarking ops/map/streaming
time                 17.91 ms   (17.48 ms .. 18.37 ms)

benchmarking ops/map/pipes
time                 29.30 ms   (28.12 ms .. 30.03 ms)

benchmarking ops/map/conduit
time                 36.69 ms   (35.73 ms .. 37.58 ms)

benchmarking ops/map/list-transformer
time                 84.06 ms   (75.02 ms .. 90.34 ms)

However when I run individual benchmarks the results are different:

$ stack bench --benchmark-arguments "-m glob ops/map/conduit"

benchmarking ops/map/conduit
time                 31.64 ms   (31.30 ms .. 31.86 ms)

$ stack bench --benchmark-arguments "-m glob ops/map/list-transformer"

benchmarking ops/map/list-transformer
time                 68.67 ms   (66.84 ms .. 70.96 ms)

To reproduce the issue just run those commands in this repo.

I cannot figure what the problem is here. I tried using "env" to run the benchmarks and putting a "threadDelay" for a few seconds and a "performGC" in it but nothing helps.

I am now resorting to always running each benchmark individually in a separate process. Maybe we can have support for running each benchmark in a separate process in criterion itself to guarantee isolation of benchmarks, as I have seen this sort of problem too often. Now I am always skeptical of the results produced by criterion.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions