What is the proper use of deadline and retry for slow connections? #12396
-
I feel like the grpc documentation for retry/hedging policies is excellent. And yet, when I ran the java examples and in my own testing I have been having a hard time getting this to work for a specific case. As a client, I integrate with a very robust grpc server. The P99 latency is sub 300ms, but it's the P99.9 I am worried about. These calls could take a while... up to 20 seconds, and that can cause a problem in our service if this chunk of data is being requested frequently. I sought to remedy this with the use of deadlines and retry. My idea was to set an aggressive deadline here and, if exceeded retry.
What I want is:
Alternatively a hedged call could go out once a threshold is exceeded and we look to see which of the two resolves first. Are such configuration strategies possible out of the box? If not, what is the recommended pattern for implementing this? Currently I set a deadline per call and wrap the operation within a retry supplier (using resilience 4j) |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
There is not a per-attempt timeout available. Hedging works better for this case. The "threshold" you mention would just be the "hedgingDelay" in the configuration. But you seem to know about hedging, so I don't quite know why it didn't seem like what you're looking for. Maybe take another look at it. |
Beta Was this translation helpful? Give feedback.
Okay, then I guess the answer to this discussion is "retry is not for dealing with slow connections." It is for increasing reliability, not reducing latency. Hedging is for reducing latency.