-
Notifications
You must be signed in to change notification settings - Fork 757
Description
tl;dr - I have some changes, but we're not sure if they're appropriate to upstream because they're a bit specific to our issues. Please advise
I'm working with some systems that are particularly I/O bound and we're seeing a lot of dropped can messages in the logs we're taking with candump
.
We run candump
on startup, logging to a file. Every hour, we kill it, rotate the file, and start it again. We've opted to use this approach over something like logrotate
because an external tool can't guarantee we won't rotate in the middle of a write to the file (unless the kernel can?)
Regardless of if this is an appropriate rotation strategy, we're seeing a lot of random dropped messages in the middle of logs. We've determined that it's due to the fact that we occasionally saturate the disk I/O on the system - candump
blocks writing to the file for a couple seconds, can't read from the socket, and the socket overflows and drops frames.
We're looking to solve this by spawning a new thread and doing all logging from there, buffering internally in an auto-expanding circular queue. This approach fixes the dropped frames issue; but currently is unbounded on the amount of memory that candump
will use at runtime - in practice it's bounded by the amount of time that disk I/O blocks; once the buffer has expanded to a size that accommodates our average bus load for a fairly long block, it's pretty stable.
At the same time, we've moved file rotation into candump
in that side thread; this guarantees we won't drop any frames between ending one log and starting a new one. We could also accomplish this by starting the new log before stopping the old one and generating a brief period of overlap.
So my question is really just should we put in a PR for some or all of these changes? I could see arguments against pulling in pthread
as a dependency or making candump
a bit heavier-weight tool. The approach of adding file rotation internally also breaks the single-tool, single-function
philosophy, but I don't know how much this project strives to accomplish that.