Skip to content

Performance: Encoding of keys/values in CommandArgs when using a codec that implements ToByteBufEncoder #2610

@shikharid

Description

@shikharid

Feature Request

Remove unwanted allocation when encoding CommandArgs when using a codec of type ToByteBufEncoder

Is your feature request related to a problem? Please describe

I was analysing cpu/memory for a jvm server that uses Redis heavily (order of 10s of thousands hmget/sec on ~10 connections) and noticed a good amount of CPU usage of the VM (~30% of total and ~70% of eventloops cpu time) comes from CommandArgs.KeyArgument.encode

Screenshot 2024-01-24 at 9 35 39 PM

All the connections use a ByteArrayCodec. Majority of the time is basically Netty's bytebuf gc/pooling logic.
The reason seems to be our specific usecase where every hmget call has 100s of keys and this gets called for each key

I looked through the implementation and it felt like these allocs can be entirely avoided for the special case where user does all req/resp in byte arrays (like folks who use the ByteArrayCodec) or for users who can exactly estimate the size of key/val

Two disclaimers:

  • The jfr sample used to analyse this is small (5 mins during peak load)
  • I have not invested extensive time with lettuce code, so might have been foolish to assume my suggestion below works

Describe the solution you'd like

Relevant code is in CommandArgs.KeyArgument.encode()and CommandArgs.ValueArgument.encode()

            if (codec instanceof ToByteBufEncoder) {

                ToByteBufEncoder<K, V> toByteBufEncoder = (ToByteBufEncoder<K, V>) codec;
                ByteBuf temporaryBuffer = target.alloc().buffer(toByteBufEncoder.estimateSize(val) + 6);

                try {
                    toByteBufEncoder.encodeValue(val, temporaryBuffer);
                    ByteBufferArgument.writeByteBuf(target, temporaryBuffer);
                } finally {
                    temporaryBuffer.release();
                }

                return;
            }

The solution will be replacing above with something like:

          if (codec instanceof ToByteBufEncoder) {
             ToByteBufEncoder<K, V> toByteBufEncoder = (ToByteBufEncoder<K, V>) codec;
              // Below lines are basically what ByteBufferArgument.writeByteBuf does
              target.writeByte('$');
              IntegerArgument.writeInteger(target, toByteBufEncoder.estimateSize(val));
              target.writeBytes(CRLF);
              toByteBufEncoder.encodeValue(val, target);
              target.writeBytes(CRLF); 
              return;
            }

Now this has a caveat because of which I'm creating this issue, It assumes estimatedSize is not "estimated" but "exact".

Maybe we can give user the control by adding another method in ToByteBufEncoder which can tell us if the codec can predict exact sizes.
In ByteArrayCodec we always can.

This ensures no additional allocs and essentially makes it garbage free.
Happy to contribute and impl the solution.

Describe alternatives you've considered

  • Use a threadlocal on heap bytebuf as the temporary buffer, since we know that its lifecycle is method scoped (resizing as needed and discarding bytes occasionaly if it encodes a very large key)
  • Somehow encode keys/vals in "batches" so as to allocate and copy to target once. Seems tricky to do

Teachability, Documentation, Adoption, Migration Strategy

NA

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions