Skip to content

Conversation

@apparentlymart
Copy link
Contributor

@apparentlymart apparentlymart commented Jul 15, 2025

Description

In callers like HashiCorp Terraform, the plugin logger is completely disabled (hclog.Off) by default, but previously go-plugin would nonetheless waste time trying to parse log lines to decide what log level to use when calling into the logger, creating memory allocation garbage and spending time in the JSON parser only to immediately discard the results.

The hclog.Logger API exposes information about the log level so that callers can skip doing expensive preparation work unless it will actually be used. This now uses that to totally skip all of the log preparation when the logger is completely disabled, and to skip some of the work if the log line's level is not sufficient to match the logger's level.

Modern Terraform plugins tend to generate quite a lot of log messages and it seems that at least some of them generate it by default even when none of the logging environment variables are set. I'm not sure if that's intentional, but nonetheless this PR reduces the overhead of processing and discarding those unsolicited log lines enough to make a measurable difference for some providers.

The second commit removes a level of indirection in the internal modelling of additional key/value pairs in the JSON logs, so that instead of performing one allocation per JSON property these allocations are amortized by gradually growing the backing array of the slice. This is a far less significant improvement than the first commit, but it's also a pretty easy and non-invasive change that saves at least a few memory allocations per log line.

The benchmark added in the first commit demonstrates both the most happy path of having the logger disabled entirely, and the slightly-less-happy path of having the logger enabled but filtering out certain levels so it does still need to perform JSON parsing to check whether the log line needs to be logged.

BenchmarkClientLogging
BenchmarkClientLogging/off
BenchmarkClientLogging/off-16            3234806               362.5 ns/op             1 B/op          1 allocs/op
BenchmarkClientLogging/error
BenchmarkClientLogging/error-16           367056              2915 ns/op            1096 B/op         36 allocs/op
BenchmarkClientLogging/trace
BenchmarkClientLogging/trace-16           230836              4418 ns/op            1617 B/op         48 allocs/op

I believe that the "off" case above represents what Terraform's behavior would be in the default case where the TF_LOG/etc environment variables are not set. The "error" case represents TF_LOG=error, and "trace" represents TF_LOG=trace. As far as I know, Terraform currently behaves like this "trace" case regardless of what the environment variables are set to.

How Has This Been Tested?

Aside from the new benchmark mentioned above, I also ran the existing test suite and it passed without any modifications.

In callers like HashiCorp Terraform, the plugin logger is completely
disabled (hclog.Off) by default, but previously go-plugin would nonetheless
waste time trying to parse log lines to decide what log level to use when
calling into the logger, creating memory allocation garbage and spending
time in the JSON parser only to immediately discard the results.

The hclog.Logger API exposes information about the log level so that
callers can skip doing expensive preparation work unless it will actually
be used. We'll now use that to totally skip all of the log preparation
when the logger is completely disabled, and to skip _some_ of the work
if the log line's level is not sufficient to match the logger's level.
@apparentlymart
Copy link
Contributor Author

Hi @jbardin! I'm sorry for the very rude nudge, but I think you've typically been interested in the way go-plugin logging interacts with Terraform (based on the changes made to this function in the past) and so maybe you'd find this change interesting.

@jbardin
Copy link
Member

jbardin commented Jul 16, 2025

Thanks for the heads up @apparentlymart. This all makes sense to me, so I'll give it a more thorough review shortly. I also noticed that someone committed #292, which breaks Terraform's plugin panic handling, so I'll also have to fix that before a new release is tagged.

@jbardin jbardin self-assigned this Jul 16, 2025
@apparentlymart
Copy link
Contributor Author

apparentlymart commented Jul 16, 2025

If you're going to be potentially making other changes in this area anyway...

While I was working on this I noticed that a plugin can make its client crash by writing a JSON log line where @level, @timestamp, or @message is not a JSON string, because the parsing code for those does unchecked type assertions:

go-plugin/log_entry.go

Lines 47 to 65 in a938e9a

// Parse hclog-specific objects
if v, ok := raw["@message"]; ok {
entry.Message = v.(string)
delete(raw, "@message")
}
if v, ok := raw["@level"]; ok {
entry.Level = v.(string)
delete(raw, "@level")
}
if v, ok := raw["@timestamp"]; ok {
t, err := time.Parse("2006-01-02T15:04:05.000000Z07:00", v.(string))
if err != nil {
return nil, err
}
entry.Timestamp = t
delete(raw, "@timestamp")
}

Since all commonly-used plugins are presumably currently generating their JSON logs also using go-hclog I don't think this is likely to cause a big problem in the short term, but maybe worth making this more robust in case more folks try to reimplement the go-plugin protocol (such that it is) in other languages and might accidentally end up generating something this code can't handle well.

(I didn't do anything about that in this PR just because I wanted to focus on one concern at a time.)

@apparentlymart
Copy link
Contributor Author

When I looked at this again today I noticed I neglected to push a final set of changes to correct a test failure I'd caused; the latest version of this which I just forced-pushed is what I had intended to submit. Sorry for the noise.

Previously the JSON log line parsing code would allocate separately for
each additional key/value pair in a log line. Using a flat slice of
logEntryKV instead of indirection through pointers amortizes that memory
allocation cost a little better, for a marginal decrease in allocations
per log line.

The resulting logEntryKV objects are always passed and used all together
anyway, so there is no clear advantage to splitting each into a separate
memory allocation.
Copy link
Member

@jbardin jbardin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT!

@jbardin
Copy link
Member

jbardin commented Aug 5, 2025

@hashicorp/team-ip-compliance, waiting on review please

@sonamtenzin2 sonamtenzin2 merged commit b02adb3 into hashicorp:main Aug 7, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants