-
Notifications
You must be signed in to change notification settings - Fork 14
Upgrade podman to fix critical container issues #1126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
From the documentation: > 'podman image prune' removes all dangling images from local storage. > With the all option, all unused images are deleted (i.e., images not > in use by any container). > > The image prune command does not prune cache images that only use > layers that are necessary for other images. So, when the container script is called in the cleanup phase of the lifetime of a container, we can use the '--all' option to ensure we also remove this container's loaded image. In the case this happens before a reboot of the system, there will be no old version of the image loaded to /var/lib/containers after boot. Issue #1098 Signed-off-by: Joachim Wiberg <[email protected]>
Signed-off-by: Joachim Wiberg <[email protected]>
jovatn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only checked README updates, and they look great!
Found a minor typo, that's all.
As Infix matures as an operating system it is quickly becoming more and more useful also for end-device use-cases. The README should reflect this change in focus. Signed-off-by: Joachim Wiberg <[email protected]>
Highlights: - fixes to systemd and s6 type services - bare-bones libsystemd replacement with #include <systemd/sd-daemon.h> - new reload:script mimicking systemd ExecReload, and - new stop:script mimicking systemd ExecStop - exit status/signal info when a process dies - service kill:SEC now support up to 300 sec. - the /tmp/norespawn trick now also covers service_retry() - the sysv 'stop' command process environment is now same as 'start' - State machine ordering issue: enter new config generation after services disabled in previous generation have been stopped Full changelog at: - <https://github.com/troglobit/finit/releases/tag/4.13> - <https://github.com/troglobit/finit/releases/tag/4.14> Fixes #1123 Signed-off-by: Joachim Wiberg <[email protected]>
This major upgrade, along with the upgrade to Finit v4.14, is what is needed to fix #1123, which was caused by some odd futex locking bug in Podman that left lingering issues in /var/lib/containers state files. The root cause as fixed already in v4.7.x, but since CNI is supported up to and including 4.9.5, going with a later release seemd prudent. Full changelogs at: - <https://github.com/containers/podman/releases/tag/v4.5.1> - <https://github.com/containers/podman/releases/tag/v4.6.0> - <https://github.com/containers/podman/releases/tag/v4.6.1> - <https://github.com/containers/podman/releases/tag/v4.6.2> - <https://github.com/containers/podman/releases/tag/v4.7.0> - <https://github.com/containers/podman/releases/tag/v4.7.1> - <https://github.com/containers/podman/releases/tag/v4.7.2> - <https://github.com/containers/podman/releases/tag/v4.8.0> - <https://github.com/containers/podman/releases/tag/v4.8.1> - <https://github.com/containers/podman/releases/tag/v4.8.2> - <https://github.com/containers/podman/releases/tag/v4.8.3> - <https://github.com/containers/podman/releases/tag/v4.9.0> - <https://github.com/containers/podman/releases/tag/v4.9.1> - <https://github.com/containers/podman/releases/tag/v4.9.2> - <https://github.com/containers/podman/releases/tag/v4.9.3> - <https://github.com/containers/podman/releases/tag/v4.9.4> - <https://github.com/containers/podman/releases/tag/v4.9.5> Fixes #1123 Signed-off-by: Joachim Wiberg <[email protected]>
|
Great work overall, you are the 🥇 |
wkz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! 🥇
Signed-off-by: Joachim Wiberg <[email protected]>
The extended kill delay (10 sec) is sometimes not enough for complex system containers. Also, podman sometimes take the opportunity to do housekeeping tasks when stopping a container. So, allow for up to 30 sec. grace period before we send SIGKILL. With the latest image prune extension, set a 60 sec. timeout for the cleanup task, in case podman gets stuck. This to prevent any future mishaps. Signed-off-by: Joachim Wiberg <[email protected]>
When a container's image is on an inaccessible remote server, the container wrapper script waits in the background for any netowrk changes to retry download of the image. This change avoids the dangerous previous construct, and is also easier to read: timeuot after 60 seconds unless ip monitor reads at least one event before that. Fixes #1124 Signed-off-by: Joachim Wiberg <[email protected]>
- the port-mapping plugin supports iptables or nftables - the firewall plugin support only iptables or firewalld Enforce use of iptables wrapper for nftables, for now, in both plugins. This all needs to be refactored to run podman with "unmanaged" networks in the future. Related to issue #1125 Signed-off-by: Joachim Wiberg <[email protected]>
- Drop redundant comments - Drop redundant imports - PEP-8 fixes Signed-off-by: Joachim Wiberg <[email protected]>
|
Another minor change was added late to this PR, issue #1127, discussed with and approved by @mattiaswal |
d68230f to
46cd249
Compare
Signed-off-by: Joachim Wiberg <[email protected]>
Usually the CNI bridge plugin "takes care" of enabling IPv4 forwarding on all interfaces, see issue #1125, but when the container tests are run in a different order from the infix_containers.yaml, Infix may reset the IPv4 forwarding on this critical interface. This change is both future proof and also ensures the test works as it was intended even if tests are run out-of-order. Signed-off-by: Joachim Wiberg <[email protected]>
Regression test for issue #1123 Signed-off-by: Joachim Wiberg <[email protected]>
Signed-off-by: Joachim Wiberg <[email protected]>
For a heavily loaded system, 10 seconds/retries is not enough time to expect containers to have started up. Particularly after the changes done recently to do prune before and after a container is started. Signed-off-by: Joachim Wiberg <[email protected]>
Fixes #1127 Signed-off-by: Joachim Wiberg <[email protected]>
Description
This PR addresses a set of container issues discovered while troubleshooting #1105, turns out that disabling or removing a container may under certain circumstances cause podman to deadlock and leave persistent file locks in
/var/lib/containers:ip monitorprocesses #1124The fix is to upgrade podman, to v4.9.5 (the last before they removed CNI support), and to also upgrade Finit, to v4.14, to allow all services to properly complete before starting the next "configuration generation". See the commit messages for more information.
A regression test,
container_enabled, has been added to ensure this particular issue never creeps back in. For improved test coverage, another test for verifying environment variables,container_environment, was also added.Note
Also included in this PR is an updated logo and slightly refreshed README that's worth checking out 😃
Checklist
Tick relevant boxes, this PR is-a or has-a: