-
Notifications
You must be signed in to change notification settings - Fork 530
ftrace: ClearPerCpuTrace only for offline CPUs #2260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
LG but I'll take Ryan take the lead on this one. |
// Each range is either a single CPU or a range of CPUs, e.g. "0-3,5,7-9". | ||
// Source: https://docs.kernel.org/admin-guide/cputopology.html | ||
std::vector<uint32_t> offline_cpus_tmp; | ||
for (base::StringSplitter ss(offline_cpus_str, ','); ss.Next();) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this, but will note that Primiano some concerns that we're introducing too much complexity for an edge case, and that we should keep it simple.
One option would be to just check that there's at least one offline cpu core before doing the per-cpu fallback, instead of trying to parse the exact mask. That would move the edge case to "many-core machine + offlined cores + very sensitive to time spent in ClearTrace()", but also avoid whatever potential string parsing errors we're not seeing here :--).
Consider the much simpler to read & maintain callsite in that scenario:
if (!HasOfflineCpus()) {
return;
for (size_t cpu = 0, num_cpus = NumberOfCpus(); cpu < num_cpus; cpu++) {
ClearPerCpuTrace(cpu);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning for current approach is:
-
For no offline cpu: the string should be empty (if not some wider issue in kernel or in base functions(TrimWhitespace/StringSplitter)), so effectively no parsing logic should happen.
-
For some offline cpus(though rare): switching to
just check that there's at least one offline cpu core before doing the per-cpu fallback, instead of trying to parse the exact mask.
would effectively mean no trace for scenario with high NumberOfCpus and short trace_duration.
From local runs we observe:
[242.828] tracefs.cc:466 Tracefs::ClearTrace started for NumberOfCpus=128
[251.148] tracefs.cc:474 Tracefs::ClearTrace done
[416.948] tracefs.cc:468 Tracefs::ClearTrace started for NumberOfCpus=256
[432.652] tracefs.cc:475 Tracefs::ClearTrace done
Now if no issue with parsing, we improve on this and in case any issue we fallback effectively to the same.
Open to change if you aren't convinced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so effectively no parsing logic should happen.
But it's still a bunch of code added to the class, so the ratio of code vs applicability is rather low. Plus the "rare" codepath will be less exercised in practice and therefore more likely to have issues/bitrot.
But to reiterate, I'm personally fine with this patch. In fact we can probably later move the string parsing to src/kernel_utils/, as it comes up in a bunch of procfs/sysfs places.
@@ -655,12 +715,16 @@ bool Tracefs::IsFileReadable(const std::string& path) { | |||
return access(path.c_str(), R_OK) == 0; | |||
} | |||
|
|||
bool Tracefs::ReadFile(const std::string& path, std::string* str) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's wrong with using ReadFileIntoString for the offlined cores file? I'd prefer not to add any extra shims for the mock-based tests. (Plus I think we should be moving towards a FakeTracefs for majority of tests anyway.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's wrong with using ReadFileIntoString for the offlined cores file?
We want to know if base::readfile fails so we can invoke the fallback of clearing all CPUs. Currently ReadFileIntoString
in case of failure returns empty string which is also a possible return value when no cpu is offline.
do you mean MockTracefs
by FakeTracefs ?
Now the reason for adding the wrapper function Tracefs::ReadFile
for base::ReadFile
is purely for testing purposes following the recommendation go/gmockcook#mocking-free-functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see your point of needing to distinguish the error case, and wanting to inject that in the test.
It's weird to have a mocking testing shim for both ReadFileIntoString and ReadFile, if the earlier is a minimal wrapper around the latter. Though I can see how it's easier to mock the earlier.
do you mean MockTracefs by FakeTracefs?
I was talking about a hypothetical fake that we should finally write. A significant chunk of ftrace tests do not assert interactions, and therefore would be easier to maintain with a fake. But let's put that aside for this patch, it's not directly relevant.
Clearing `/sys/kernel/tracing/trace` should clear all per-CPU trace data for online CPUs. For offline CPUs, trace data might still be pending in their `trace_pipe_raw`. To optimize the clearing process and minimize time taken, only clear the trace for these offline CPUs instead of iterating over all CPUs.
8ab25a4
to
c3c236a
Compare
// Parses the list of offline CPUs from "/sys/devices/system/cpu/offline" and | ||
// returns them as a vector if successful, or std::nullopt if any failure in | ||
// reading or parsing. | ||
std::optional<std::vector<uint32_t>> GetOfflineCpus(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this protected since it's effectively a utility helper function that is only here because tracefs is the single caller atm, and the only reason you're exposing it in the interface is for tests?
@@ -143,5 +152,39 @@ TEST(TracefsTest, ReadBufferSizeInPages) { | |||
EXPECT_THAT(ftrace.GetCpuBufferSizeInPages(), 1); | |||
} | |||
|
|||
TEST(TracefsTest, GetOfflineCpus) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be useful to have a test showing that the per-cpu buffer clearing is working as expected when calling into ClearTrace(), since that's the logic we actually care about.
Clearing
/sys/kernel/tracing/trace
should clear all per-CPU trace data for online CPUs (src). For offline CPUs, trace data might still be pending in theirtrace_pipe_raw
. To minimize time taken, only clear the trace for these offline CPUs instead of for all.Flow:
Fixes: #519
Test: Tested locally by manually turning off some cpus on both linux and android