Skip to content

Getting java.lang.OutOfMemoryError: Direct buffer memory #705

@vleushin

Description

@vleushin

Lettuce version: 5.0.2.RELEASE
Reproducible in Linux (Kubernetes), Windows (my local machine), likely everywhere

I've started testing Redis cluster in Kubernetes. So far -- not bad, all failover scenarios worked fine, but there was one big problem -- memory leaks. It was not evident for me at first (because it was direct memory leak and I was looking at heap charts), but I think I tackled two cases.

First one is topology refresh. I have single node redis cluster (redis-cluster) in docker compose for local testing.
With this options:

ClusterTopologyRefreshOptions.builder()
    .enablePeriodicRefresh(Duration.ofSeconds(2)) // anything will do, but small value will lead to exception faster
    .enableAllAdaptiveRefreshTriggers()
    .build()

And small direct memory size, e.g. -XX:MaxDirectMemorySize=100M or 200M, I can get OOM exception in 1-2 minutes. Exception looks like this:

2018-02-17 13:33:17.243 [WARN] [lettuce-eventExecutorLoop-63-8] [i.l.c.c.t.ClusterTopologyRefresh] - Cannot retrieve partition view from RedisURI [host='redis-cluster', port=7000], error: java.util.concurrent.ExecutionException: java.lang.NullPointerException
2018-02-17 13:33:17.243 [WARN] [lettuce-nioEventLoop-65-3] [i.l.c.p.CommandHandler] - null Unexpected exception during request: java.lang.NullPointerException
java.lang.NullPointerException: null
	at io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:500) ~[lettuce-core-5.0.2.RELEASE.jar:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1414) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:945) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:141) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) [netty-transport-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886) [netty-common-4.1.21.Final.jar:4.1.21.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.21.Final.jar:4.1.21.Final]
	at java.lang.Thread.run(Thread.java:844) [?:?]

Looks like Netty is out of direct memory.

Second case I'm less sure, I did not do extensive testing, but I think they are connected. I have 7 redis node cluster in our Kubernetes environment. We killed one master to see if it will failover. It did, topology refreshed, everything seemed OK. But in background Lettuce kept pinging/trying to connect to dead node (only seen when turned on Lettuce debug log), and direct memory quickly dried up and node died.

Any thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions