Skip to content

Conversation

cyningsun
Copy link
Contributor

This is a draft PR to resolve #3497, based on #3418 as requested by @ndyakov. It continues the discussion from #3502

@cyningsun cyningsun marked this pull request as draft September 9, 2025 09:08
@cyningsun cyningsun changed the title [Draft] Feat/improve success rate of new connections Feat/improve success rate of new connections Sep 9, 2025
@ndyakov
Copy link
Member

ndyakov commented Sep 9, 2025

@cyningsun thank you for opening this, will check it with @htemelski-redis and write back.

Comment on lines +601 to +602
err = ctx.Err()
return nil, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it call p.freeTurn here as well? Just took a brief look over the PR, haven't done detailed review yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After starting the go-routine, freeTurn() is delegated to this new go-routine

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, understood, but I am wondering how this will be different from the current approach, since the turn will be available only when the connection is either created or fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're correct that turn remains the same: Only becomes available after a connection creation attempt concludes (either successfully or with a definitive failure).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think now I understand the idea.
If for whatever reason we fail after the async dialer has started, we will error out to the command, but the potential opened tcp connection will try to be stored back in the pool (if there is place). Can't we optimize this a bit more. Two things we don't know but can help us here:

  • Are the other dialers executing at the moment? If so, why don't we provide the initialized connection to someone else that is currently waiting for its dialer to complete?
  • Is there place in the pool? This doesn't need to be exactly correct, but if there isn't place in the pool when initially starting the dialer, we can assume there won't be when we complete it, so we can fail early and don't wait for the tcp connection to be created.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Regarding your first point (connection sharing): I had similar thoughts about implementing a connection sharing or dial coalescing mechanism to prevent duplicate work. It's a fantastic idea. However, one concern that came to my mind by the time I started this PR is about fairness and potential starvation. If we put the successful connection into a channel for all waiting dialers to consume, the lack of priority in Go channels might mean that the earliest request isn't necessarily the first to get the connection. To address this fairness concern, I believe we could implement using a queue instead of a simple channel. This would ensure that waiters are served in a first-come-first-served manner, preventing any starvation issues. I will implement this approach in the next iteration of the PR.
  • Regarding your second point (early failure): In current pool design, the number of turns equals the PoolSize, and acquiring a turn is a prerequisite for starting a dialer. This means that by the time a dialer is started, we've already secured a place for the future connection in the pool. So, logically, a successfully created connection should always have a slot to be stored. I'm concerned I might be missing something here — could you please help me understand if my reasoning is correct or if there's a gap in my understanding of the current implementation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyningsun to the second point:
In this PR and in the case that is implemented here you are correct, if we have a turn, there should be a slot in the pool to keep the connection. However, if the MaxIdleConns is set the connection won't be kept in the pool.

if p.cfg.MaxIdleConns == 0 || p.idleConnsLen.Load() < p.cfg.MaxIdleConns {
// unusable conns are expected to become usable at some point (background process is reconnecting them)
// put them at the opposite end of the queue
if !cn.IsUsable() {
if p.cfg.PoolFIFO {
p.connsMu.Lock()
p.idleConns = append(p.idleConns, cn)
p.connsMu.Unlock()
} else {
p.connsMu.Lock()
p.idleConns = append([]*Conn{cn}, p.idleConns...)
p.connsMu.Unlock()
}
} else {
p.connsMu.Lock()
p.idleConns = append(p.idleConns, cn)
p.connsMu.Unlock()
}
p.idleConnsLen.Add(1)
} else {
p.removeConnWithLock(cn)
shouldCloseConn = true
}

What we can do, to not waste this connection, is to keep it for some time without checking the MaxIdleConns on put, but rather cleanup the idles in checkMinIdle (which should be triggered on popIdle?

Overall, if we have reached creation of a new connection it should be the case that there are no usable idles at this time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ndyakov Ah, thank you for the clarification! I will work on integrating this strategy into the implementation.

@ndyakov ndyakov deleted the branch redis:ndyakov/CAE-1088-resp3-notification-handlers September 10, 2025 19:18
@ndyakov ndyakov closed this Sep 10, 2025
@ndyakov
Copy link
Member

ndyakov commented Sep 10, 2025

@cyningsun sorry, this was automatically closed when I merged the PR it was based on. Feel free to open it against master.

@cyningsun
Copy link
Contributor Author

@ndyakov Got it, thanks for the update! I'll work on getting a new version prepared and open a PR against master when it's ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants