What is the proper use of deadline and retry for slow connections? #12396

randeepbydesign · 2025-10-01T15:23:38Z

randeepbydesign
Oct 1, 2025

I feel like the grpc documentation for retry/hedging policies is excellent. And yet, when I ran the java examples and in my own testing I have been having a hard time getting this to work for a specific case.

As a client, I integrate with a very robust grpc server. The P99 latency is sub 300ms, but it's the P99.9 I am worried about. These calls could take a while... up to 20 seconds, and that can cause a problem in our service if this chunk of data is being requested frequently.

I sought to remedy this with the use of deadlines and retry. My idea was to set an aggressive deadline here and, if exceeded retry.
Setting the deadline was easy, however that is a "global" deadline. Consider a use case:

A stub is configured with retry enabled
Request A is made with a deadline of 1s and will turn out to have a max latency of 2s.
The deadline passes and the entire call fails

What I want is:

A stub is configured with retry enabled
Request A is made with a deadline of 1s and will turn out to have a max latency of 2s.
The deadline passes and the stub retries the call
The new call resolves quickly.

Alternatively a hedged call could go out once a threshold is exceeded and we look to see which of the two resolves first.

Are such configuration strategies possible out of the box? If not, what is the recommended pattern for implementing this? Currently I set a deadline per call and wrap the operation within a retry supplier (using resilience 4j)

Answered by ejona86

Oct 3, 2025

Okay, then I guess the answer to this discussion is "retry is not for dealing with slow connections." It is for increasing reliability, not reducing latency. Hedging is for reducing latency.

View full answer

ejona86 · 2025-10-01T16:50:22Z

ejona86
Oct 1, 2025
Maintainer

There is not a per-attempt timeout available. Hedging works better for this case. The "threshold" you mention would just be the "hedgingDelay" in the configuration. But you seem to know about hedging, so I don't quite know why it didn't seem like what you're looking for. Maybe take another look at it.

3 replies

randeepbydesign Oct 3, 2025
Author

Yeah- I didn't want to muddy the waters too much in one discussion. But the tl;dr is that I was not able to confirm hedging works the way I expect it to. I arrived at that conclusion after testing the grpc java project referenced in the official documentation and will start a new thread to clarify things there.

Thanks!

ejona86 Oct 3, 2025
Maintainer

Okay, then I guess the answer to this discussion is "retry is not for dealing with slow connections." It is for increasing reliability, not reducing latency. Hedging is for reducing latency.

Answer selected by randeepbydesign

randeepbydesign Oct 3, 2025
Author

thank you @ejona86

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What is the proper use of deadline and retry for slow connections? #12396

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

What is the proper use of deadline and retry for slow connections? #12396

Uh oh!

randeepbydesign Oct 1, 2025

Replies: 1 comment · 3 replies

Uh oh!

ejona86 Oct 1, 2025 Maintainer

Uh oh!

randeepbydesign Oct 3, 2025 Author

Uh oh!

ejona86 Oct 3, 2025 Maintainer

Uh oh!

randeepbydesign Oct 3, 2025 Author

randeepbydesign
Oct 1, 2025

Replies: 1 comment 3 replies

ejona86
Oct 1, 2025
Maintainer

randeepbydesign Oct 3, 2025
Author

ejona86 Oct 3, 2025
Maintainer

randeepbydesign Oct 3, 2025
Author