-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Fix a locking bug in that could cause a double-unlock #26971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging no new tests, we probably can't reproduce the conditions required for double-unlock reliably (busy error when removing exec session directory). But don't merge just yet, I think there's some cleanup necessary in |
|
[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore. |
1 similar comment
|
[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore. |
|
Alright, the exec session patch breaks it, because we can't retrieve the exec session PID to stop it early if there's a timeout. Inconvenient. I'll think about this more in the morning. |
| defer func() { | ||
| // cleanupExecBundle MUST be called with the parent container locked. | ||
| if !unlock { | ||
| c.lock.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you cannot ever lock without calling syncContainer() because the function will call c.save() with an outdated state otherwise which will cause really bad problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we also add the !c.batched condition? Otherwise the block here depends on unlock being false only when !c.batched and that could change without us noticing the breakage of the dependency.
|
Why do we need to do this ignore ENOTEMPTY and EBUSY at all? I see this was added in e252b3b which mentions nfs. Our VMs do not run on nfs yet we still landed in that code path. Doesn't ENOTEMPTY mean something is writing into the directory while we delete all files from it, how would happen? And EBUSY is for mount points which definitely should not be the case here? Do we actually gain anything by unlocking instead of busy looping while staying locked? How often would such an ENOTEMPTY/EBUSY error actually happen and how long does it take to get resolved? cc @giuseppe |
|
that is/was needed because if any program (like conmon) has an open file in that directory, you can't really delete it on NFS. NFS will rename it to some weird |
| if c.state.ExecSessions == nil { | ||
| c.state.ExecSessions = make(map[string]*ExecSession) | ||
| } | ||
| c.state.ExecSessions[session.ID()] = session | ||
| defer delete(c.state.ExecSessions, session.ID()) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we leak the session if podman crashes in healthCheckExec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm attempting to completely remove the exec session from the database for healthchecks, so that ought to completely remove the risk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how will the GC work for these sessions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. This does ensure the database is clean, but we'd potentially leak the directory used for the exec session. I wonder if we can do away with that entirely, it's not like healthchecks have high requirements...
Regardless, I'm going to drop the "remove session from DB" patch from here, this is proving to be much more complicated than I hoped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am I missing something, I thought the point was already the we do not commit the session to the db ad9839a?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we're still mutating c.state.State(). And then we sync that back to the database, potentially, in several places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Not intentionally, I mean - any c.syncContainer() after we mutate state is a potential save point
| defer func() { | ||
| // cleanupExecBundle MUST be called with the parent container locked. | ||
| if !unlock { | ||
| c.lock.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we also add the !c.batched condition? Otherwise the block here depends on unlock being false only when !c.batched and that could change without us noticing the breakage of the dependency.
Sure that was the reason why it was added but that does not explain how the hell we hit that code path in our CI VMs which AFAIK use a normal ext4 fs? |
The `cleanupExecBundle` function was only meant to be called on a locked container, as it does some state mutation operations. It also has a timed wait (if the directory is busy and can't be removed yet, give it a few milliseconds) in which it deliberately yields the lock to not block the container for that time. The `healthCheckExec()` function calls `cleanupExecBundle` out of a `defer` block. This is after the `defer c.lock.Unlock()` so it fires afterwards when the function returns, so we're normally fine - the container is still locked when our defer runs. The problem is that `healthCheckExec()` also unlocks the container during the expensive exec operation, and can actually fail and return while not holding the lock - meaning our `defer` can fire on an unlocked container, leading to a potential double unlock in `cleanupExecBundle`. We could, potentially, re-lock the container after the exec occurs, but we're actually waiting for a `select` to trigger to end the function, so that's not a good solution. Instead, just re-lock (if necessary) in the defer, before invoking `cleanupExecBundle()`. The `defer c.lock.Unlock()` will fire right after and unlock after us. Fixes containers#26968 Signed-off-by: Matthew Heon <[email protected]>
|
Comments addressed, one flake rerun. I'm going to look at the session bits separately - that is definitely a larger set of changes. |
giuseppe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Luap99
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe, Luap99, mheon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
a589f51
into
containers:main
The
cleanupExecBundlefunction was only meant to be called on a locked container, as it does some state mutation operations. It also has a timed wait (if the directory is busy and can't be removed yet, give it a few milliseconds) in which it deliberately yields the lock to not block the container for that time.The
healthCheckExec()function callscleanupExecBundleout of adeferblock. This is after thedefer c.lock.Unlock()so it fires afterwards when the function returns, so we're normally fine - the container is still locked when our defer runs. The problem is thathealthCheckExec()also unlocks the container during the expensive exec operation, and can actually fail and return while not holding the lock - meaning ourdefercan fire on an unlocked container, leading to a potential double unlock incleanupExecBundle.We could, potentially, re-lock the container after the exec occurs, but we're actually waiting for a
selectto trigger to end the function, so that's not a good solution. Instead, just re-lock (if necessary) in the defer, before invokingcleanupExecBundle(). Thedefer c.lock.Unlock()will fire right after and unlock after us.Fixes #26968
Does this PR introduce a user-facing change?