When using multiple benchmarks earlier ones affect the ones coming later

I have the following benchmarks in a group:

```haskell
        bgroup "map"
          [ bench "machines" $ whnf drainM (M.mapping (+1))
          , bench "streaming" $ whnf drainS (S.map (+1))
          , bench "pipes" $ whnf drainP (P.map (+1))
          , bench "conduit" $ whnf drainC (C.map (+1))
          , bench "list-transformer" $ whnf drainL (lift . return . (+1))
          ]
```

The last two benchmarks take significantly more time when I run all these benchmarks in one go using ``stack bench --benchmark-arguments "-m glob ops/map/*"``. 

```
$ stack bench --benchmark-arguments "-m glob ops/map/*"

benchmarking ops/map/machines
time                 30.23 ms   (29.22 ms .. 31.04 ms)

benchmarking ops/map/streaming
time                 17.91 ms   (17.48 ms .. 18.37 ms)

benchmarking ops/map/pipes
time                 29.30 ms   (28.12 ms .. 30.03 ms)

benchmarking ops/map/conduit
time                 36.69 ms   (35.73 ms .. 37.58 ms)

benchmarking ops/map/list-transformer
time                 84.06 ms   (75.02 ms .. 90.34 ms)
```

However when I run individual benchmarks the results are different:

```
$ stack bench --benchmark-arguments "-m glob ops/map/conduit"

benchmarking ops/map/conduit
time                 31.64 ms   (31.30 ms .. 31.86 ms)

$ stack bench --benchmark-arguments "-m glob ops/map/list-transformer"

benchmarking ops/map/list-transformer
time                 68.67 ms   (66.84 ms .. 70.96 ms)
```
To reproduce the issue just run those commands in [this repo](https://github.com/harendra-kumar/streaming-benchmarks/tree/a5a329fd10550366de33e53aeb7417c2ae608a8e).

I cannot figure what the problem is here. I tried using "env" to run the benchmarks and putting a "threadDelay"  for a few seconds and a "performGC" in it but nothing helps.

I am now resorting to always running each benchmark individually in a separate process. Maybe we can have support for running each benchmark in a separate process in criterion itself to guarantee isolation of benchmarks, as I have seen this sort of problem too often. Now I am always skeptical of the results produced by criterion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

When using multiple benchmarks earlier ones affect the ones coming later #166

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

When using multiple benchmarks earlier ones affect the ones coming later #166

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions