Skip to content

Inconsistent MemoryCache stats #108333

@verdie-g

Description

@verdie-g

Description

The stats provided by MemoryCache might randomly decrease for a short time when a thread dies.

That problem did not occur to me in production but I would like to use a similar pattern in my library, so I would be interested to double check that the pattern is valid.

Reproduction Steps

  1. A thread dies
  2. The thread and thread locals get collected
  3. The WeakReference in MemoryCache._allStats should now point to null
  4. During that time, the thread local stats is not referenced by the _allStats list anymore and is not in the dead threads accumulator either
  5. The Stats finalizer is called and the value of the thread counter is now added to the accumulators.

Here is some code reproducing the problem

using Microsoft.Extensions.Caching.Memory;

MemoryCache cache = new(new MemoryCacheOptions { TrackStatistics = true });

void RunThread()
{
    Thread t = new(() =>
    {
        for (int j = 0; j < 10_000; j += 1)
        {
            _ = cache.Get("");
        }

        RunThread();
    })
    {
        Name = "Cache Worker",
    };
    t.Start();
}

for (int i = 0; i < Environment.ProcessorCount - 1; i += 1)
{
    RunThread();
}

Thread integrityThread = new(() =>
{
    long lastValue = -1;
    while (true)
    {
        long newValue = cache.GetCurrentStatistics()!.TotalMisses;
        if (newValue < lastValue)
        {
            Console.WriteLine($"{DateTime.Now:HH:mm:ss.fff} ERROR: total misses decreased from {lastValue} to {newValue} (-{lastValue - newValue})");
        }

        lastValue = newValue;
    }
})
{
    Name = "Stats Integrity Checker"
};
integrityThread.Start();
integrityThread.Join();

prints

12:33:56.685 ERROR: total misses decreased from 15228455 to 8638696 (-6589759)
12:33:57.135 ERROR: total misses decreased from 36542054 to 9262055 (-27279999)
12:33:57.614 ERROR: total misses decreased from 71470000 to 62226058 (-9243942)
12:33:57.635 ERROR: total misses decreased from 68743304 to 49932686 (-18810618)
12:33:58.084 ERROR: total misses decreased from 99334439 to 90553748 (-8780691)
12:33:58.178 ERROR: total misses decreased from 105862703 to 96402589 (-9460114)
12:33:58.224 ERROR: total misses decreased from 109891914 to 84536179 (-25355735)
12:33:58.719 ERROR: total misses decreased from 144071687 to 134984002 (-9087685)
12:33:58.755 ERROR: total misses decreased from 146512384 to 117584531 (-28927853)
12:33:59.307 ERROR: total misses decreased from 182709797 to 173511034 (-9198763)
12:33:59.318 ERROR: total misses decreased from 179256370 to 150150238 (-29106132)
12:33:59.810 ERROR: total misses decreased from 217835394 to 208725505 (-9109889)
12:33:59.831 ERROR: total misses decreased from 217951354 to 192014222 (-25937132)

Prometheus counters are expected to be strictly monotonic increasing. A counter being decreased will be interpreted as a counter reset and will impact the rate computation.

Expected behavior

The MemoryCache stats should always increase and never decrease.

Actual behavior

The MemoryCache stats might decrease in some case.

Configuration

.NET: net9.0
OS: Mac OS Sequoia 15.3.2
Architecture: ARM64

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions