-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Feat/improve success rate of new connections #3508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/improve success rate of new connections #3508
Conversation
…l are likely done in async flow
@cyningsun thank you for opening this, will check it with @htemelski-redis and write back. |
err = ctx.Err() | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it call p.freeTurn
here as well? Just took a brief look over the PR, haven't done detailed review yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After starting the go-routine, freeTurn() is delegated to this new go-routine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, understood, but I am wondering how this will be different from the current approach, since the turn will be available only when the connection is either created or fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're correct that turn remains the same: Only becomes available after a connection creation attempt concludes (either successfully or with a definitive failure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think now I understand the idea.
If for whatever reason we fail after the async dialer has started, we will error out to the command, but the potential opened tcp connection will try to be stored back in the pool (if there is place). Can't we optimize this a bit more. Two things we don't know but can help us here:
- Are the other dialers executing at the moment? If so, why don't we provide the initialized connection to someone else that is currently waiting for its dialer to complete?
- Is there place in the pool? This doesn't need to be exactly correct, but if there isn't place in the pool when initially starting the dialer, we can assume there won't be when we complete it, so we can fail early and don't wait for the tcp connection to be created.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Regarding your first point (connection sharing): I had similar thoughts about implementing a connection sharing or dial coalescing mechanism to prevent duplicate work. It's a fantastic idea. However, one concern that came to my mind by the time I started this PR is about fairness and potential starvation. If we put the successful connection into a channel for all waiting dialers to consume, the lack of priority in Go channels might mean that the earliest request isn't necessarily the first to get the connection. To address this fairness concern, I believe we could implement using a queue instead of a simple channel. This would ensure that waiters are served in a first-come-first-served manner, preventing any starvation issues. I will implement this approach in the next iteration of the PR.
- Regarding your second point (early failure): In current pool design, the number of turns equals the PoolSize, and acquiring a turn is a prerequisite for starting a dialer. This means that by the time a dialer is started, we've already secured a place for the future connection in the pool. So, logically, a successfully created connection should always have a slot to be stored. I'm concerned I might be missing something here — could you please help me understand if my reasoning is correct or if there's a gap in my understanding of the current implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cyningsun to the second point:
In this PR and in the case that is implemented here you are correct, if we have a turn, there should be a slot in the pool to keep the connection. However, if the MaxIdleConns
is set the connection won't be kept in the pool.
go-redis/internal/pool/pool.go
Lines 643 to 665 in cb3af08
if p.cfg.MaxIdleConns == 0 || p.idleConnsLen.Load() < p.cfg.MaxIdleConns { | |
// unusable conns are expected to become usable at some point (background process is reconnecting them) | |
// put them at the opposite end of the queue | |
if !cn.IsUsable() { | |
if p.cfg.PoolFIFO { | |
p.connsMu.Lock() | |
p.idleConns = append(p.idleConns, cn) | |
p.connsMu.Unlock() | |
} else { | |
p.connsMu.Lock() | |
p.idleConns = append([]*Conn{cn}, p.idleConns...) | |
p.connsMu.Unlock() | |
} | |
} else { | |
p.connsMu.Lock() | |
p.idleConns = append(p.idleConns, cn) | |
p.connsMu.Unlock() | |
} | |
p.idleConnsLen.Add(1) | |
} else { | |
p.removeConnWithLock(cn) | |
shouldCloseConn = true | |
} |
What we can do, to not waste this connection, is to keep it for some time without checking the
MaxIdleConns
on put, but rather cleanup the idles in checkMinIdle
(which should be triggered on popIdle
?
Overall, if we have reached creation of a new connection it should be the case that there are no usable idles at this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ndyakov Ah, thank you for the clarification! I will work on integrating this strategy into the implementation.
@cyningsun sorry, this was automatically closed when I merged the PR it was based on. Feel free to open it against master. |
@ndyakov Got it, thanks for the update! I'll work on getting a new version prepared and open a PR against master when it's ready. |
This is a draft PR to resolve #3497, based on #3418 as requested by @ndyakov. It continues the discussion from #3502