-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A Proposal For Terraform Logging #695
Comments
This proposal looks really nice @paddycarver, thanks for putting it up ❤️ As a provider maintainer, this covers all the use cases I've hit and wraps it all up in a nice interface for use. Not only that, I think the configuration guideline you've outlined hits the mark on flexibility for all parties involved to avoid fixing the problem for some but making it more for others so big kudos there. I only have single point for discussion (which may be moot after some of these improvements roll out) but a request to be able to filter on the "layer" of log entry generated. For instance, providing a way to filter in/out HTTP traffic as a separate stream while still seeing the {core,sdk,provider} logging would be handy. Right now, I use a third party proxy to grab this data when it's needed as I don't often need the logging to include all but a subset of the provider communication. Depending on where you'd take this, the HTTP filtering could be a top level environment variable |
Hey @jacobbednarz! So I tried to address that in the proposal, but may have not been clear or left out key elements--sorry, drafting it was like herding cats a bit. :) It sounds like this bit would address your ask:
Though perhaps the part that needs called out here is that each of these sub-loggers is intended to be configurable to be turned on/turned off with environment variables. So you should be able to set the log level of each of those sub-loggers, including turning them off, on a logger-by-logger basis using environment variables. I think that gets you what you're looking for? But if not, I'd love to hear more about what it is you're after! |
Oh awesome! I must not have fully taken that in on the first pass so thanks for the highlight 🙇 |
I have one note about this, and maybe more after I digest it more. I find that the pattern illustrated by this line leads to poor discoverability– tflog.Info(ctx, "hello, world", "includedValue", someVar) I really want that |
Hey @ryanking, thanks for the feedback. When you say
Do you have any examples of what you mean? I'm having trouble visualizing what you're getting at. I think the proposal is using a method of exposing this that's not incredibly foreign to Go, and is a common pattern for request-scoped things, but that's definitely not to say it's the only way to do it, and I'm interested in exploring alternatives that people think may be better. |
@paddycarver thinking about this more I think it would be best to do what I would want in the provider, rather than in the SDK |
@ryanking that's my line! 😁 I'd be really interested to hear what you would want, though, to see if we could make that easier or if it's worth generalizing and offering a more paved path to achieving it. No pressure if you don't want to share, but I am interested if you do. |
Thank you all for all your feedback and input on this! We're going to move forward with the design as outlined, and hopefully with have more news and updates to share with you soon. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
Hey everyone. This is going to be a long proposal; we're going to try and unpack a thorny problem and propose a solution to it all in one. But we're incredibly interested in feedback from our ecosystem on this proposal, and would love to hear more from you all.
The Problem Description
Logging in Terraform presents a problem at the moment. Logs are currently represented as a stream of text, in which the provider's logs, the SDK's logs, and Terraform's logs all get intermingled. Unfortunately, once this happens, separating them is difficult. You get information about graph walks next to information about shims next to warnings about deprecated behavior next to HTTP request logs from the provider next to log lines printed in the provider's code. Multiple RPC calls are happening and logging in parallel, and aren't distinguished from each other at all. For a lot of users, grepping through the log for unique strings is the best they can do. Reading logs to get an accurate picture of what's happening is a difficult or impossible task, even for seasoned Terraform provider developers. Worse, this situation actively discourages logging information that could be helpful in resolving bugs because subsystems can easily drown out more relevant logs if they log too verbosely.
How Logging Works Today
Terraform's logging subsystem works today as a cooperation between several different pieces of Terraform. It all starts with
go-plugin
, the library that Terraform uses to start provider processes.go-plugin
sends anything written toos.Stderr
oros.Stdout
to Terraform. This moves the output somewhere that Terraform can process it.One level up, in the SDK, we complicate the abstraction a little bit more. We call
log.SetOutput
to override the default output of thelog
package, so calls tolog.Println
get sent somewhere else. That "somewhere else" ends up being throughgo-hclog
, which transforms the unstructured log to a structured log, by inferring which level it's at. Prefixing with[TRACE]
sets the level toTRACE
, and so on. If no known prefix is matched, the log is assumed to be atINFO
level. This structured output is then written toos.Stderr
as JSON, andos.Stderr
is redirected to send it to Terraform bygo-plugin
.Finally, one more level up, the provider developer writes something like
log.Println("[DEBUG] hello world")
in their provider. When run, this coordinates with the SDK andgo-plugin
to get transformed into JSON output of a structured log message sent to Terraform:Terraform now has this JSON, and can parse it and use it appropriately. In 0.15, Terraform core will be able to separate provider logs from core logs, and will use structured logging throughout, making the filtering based on log level much more reliable.
Unfortunately, this still leaves too many subsystems interleaving their logs together, making it difficult to sort them out.
Involved Parties
As the above section hopefully makes clear, there are multiple interested parties in how logging in Terraform providers works.
go-plugin
maintainers have their own compatibility and general applicability requirements they have a responsibility to.Challenges
A number of challenges stand in the way of us shipping a better experience for debugging providers using logs:
terraform-plugin-mux
,terraform-plugin-go
,terraform-plugin-sdk
, some as-yet unreleased other tool, or a combination of them.fmt
orlog
as their first instinct for logging from providers, both because that's a standard solution within Go and because that is how providers have traditionally handled logging. Any solution that doesn't use those two packages as the entrypoint needs to address how the different approach will be discovered by users, rather than failing silently.The Proposed Solution
We've spent some time researching this problem and have a proposal for a path forward. We think it addresses the shortcomings of the current logging situation, navigates the challenges well, and is achievable within the timeframe and bandwidth that can be devoted to the project right now, given the other constraints everyone is working under and the projects that are in motion.
It is important to note that this is a proposal and design notes. What ends up being shipped may change as new constraints or goals are discovered, as limitations prove the desired user experience impossible, or as research, prototyping, and experimentation prove a designed user experience isn't as intuitive or useful as expected and an alternative that better achieves the goals is used. This is also not a guarantee that anything will be shipped.
terraform-plugin-log
The proposal centers around a new Go module,
github.com/hashicorp/terraform-plugin-log
. The package name when imported will betflog
and will provide a logging subsystem that is native to and heavily opinionated for Terraform development. The use of a new module instead of a package in the SDK allows logging to be used interraform-plugin-go
andterraform-plugin-mux
without needing to import the SDK, potentially unnecessarily. It also allows us to iterate on and version the logger separately from the SDK. This does, however, mean that discovering the module will be more difficult for provider developers.As an implementation detail,
terraform-plugin-log
will be implemented as a wrapper around thego-hclog
library and tailor it to Terraform's specific use cases. This could change in the future, but it is a useful shorthand for illustrating howterraform-plugin-log
will work on day one. The use of a structured logger is a departure from the unstructured logging of the past, but offers a richer logging experience that lets the SDK enrich log output for users. It also matches Terraform's now-consistent use of structured logs only internally.Unlike the current situation,
terraform-plugin-log
will not use a process-global logger. This has caused problems in the past: other modules overwriting that logger, multiple provider servers being started in the same process for testing and other non-standard operation modes, etc. Rather than relying on this global mutable state, provider loggers will be injected into an RPC call'scontext.Context
. Provider developers will be able to write logs as follows:This does the mean that the deprecated non-
context.Context
-aware functions will not be able to take advantage ofterraform-plugin-log
.Because we're using structured logging and passing it using
context.Context
, provider developers can set request-scoped values to be used in every log line following that point for the RPC call:The SDK can also do this on behalf of users, injecting useful variables as part of the SDK. For example, if the SDK injects a request ID at the gRPC boundary:
Now when the provider calls
tflog.Info(ctx, "hello, world")
it'll automatically includerequest_id=54f65188-0d10-425d-9427-c778edeffcf9
, too, helping provider developers understand what is happening in RPC calls, even when calls are happening in parallel:context.Context
s that do not have a logger in them (most likely to come up in unit tests or terraform-plugin-go usage, as the framework will be injecting loggers for provider developers) will by default use a null logger, which does not write any output. We could also take the same approach covered below in Encouraging Discovery to make this behavior more obvious to the user.Named Loggers
go-hclog
offers the ability to create named loggers as sub-loggers of an existing logger. These named loggers allow subsystems to have their logs partitioned. We propose to have named loggers for the different subsystems at play in the provider:http.Transport
implementation that logs HTTP requests and responses would get its own sub-logger.By separating out loggers in this manner, provider developers will be able to filter their output and separate out the subsystems that are not under investigation. This frees up the subsystems in question to log more verbosely, knowing they won't overwhelm other, more important logs.
Encouraging Discovery
The use of a separate module makes it harder for provider developers, especially those used to reaching for
log
orfmt
, to discover the package. As a solution to that, we propose to uselog.SetOutput
to override thelog
package's defaultLogger
with one that will panic when it's used, including a message pointing developers to theterraform-plugin-log
module. This is an aggressive approach, but we believe that failing loudly is better than failing silently.Not all providers may want this behavior, and its purpose is meant to be informative, not coercive. For example, many providers using the SDK may have many existing log lines and don't wish to upgrade them. Similarly, providers may wish to use their own logger implementations instead, and while this isn't recommended, that is their choice to make. To help with this, the new panic will only be introduced in new projects like
terraform-plugin-go
and any new tools that we build from here out. The SDK will disable the panic, even when combined with aterraform-plugin-go
provider usingterraform-plugin-mux
. The same method the SDK uses to disable the panic will be available to provider developers to call themselves in theirmain
function if they want to use their own loggers, and instructions on how to disable the panic will be included in the panic message itself.We will also update
terraform-provider-scaffolding
,terraform-provider-hashicups
, and the Learn guides to useterraform-plugin-log
, as well as the relevant documentation on terraform.io.Configuration
With this many subsystems logging, and each being able to be set to their own log level for filtering log output, the number of knobs start to add up. When considering that each provider in use will have their own set of these, the number of knobs starts to be overwhelming.
Fortunately, the job here isn't to surface all the complexity the system enables, it's to surface the useful complexity that the system enables.
To do so, it's worth thinking about what subsystem owns which parts of the logs:
stdout
andstderr
, so it is the gatekeeper of the final log output. But it really only cares about two things: do we want logs from core? And do we want logs from providers? The specific subsystems the provider has aren't really any of Core's concern, so its existingTF_LOG_CORE
andTF_LOG_PROVIDER
environment variables are sufficient to control what it cares about. Ideally, core would be able to control which providers can send logs, but specifying which levels apply to which providers at that point gets complicated. Nobody wants to typeTF_LOG_PROVIDER=registry.terraform.io/hashicorp/random=debug,registry.terraform.io/hashicorp/null=off
. To keep this simple, we'll limit core's configuration to just the configuration that exists in Terraform 0.15.TF_LOG_SDK_PLUGIN=trace
could be used to controlterraform-plugin-go
.TF_LOG_SDK=off
could be used to turn off all SDK logging.TF_LOG_PROVIDER_RANDOM=off
, for example, could turn the logs for therandom
provider off. There is a risk that provider developers will choose non-unique environment variables, but since the environment variable is mostly only useful to the developers of that provider, the people most impacted by collisions are the people best suited to resolve them. The SDK will plan to surface helpers that will register common environment variables to help drive consistency of this.As an illustrative table:
TF_LOG=off
TF_LOG_PROVIDER=WARN
TF_LOG_SDK=WARN
TF_LOG_PROVIDER
orTF_LOG
is set to a less verbose logging setting, the less verbose logging setting is used.TF_LOG_PROVIDER_RANDOM=TRACE
TF_LOG_PROVIDER
orTF_LOG
is set to a less verbose logging setting, the less verbose logging setting is used.TF_LOG_CORE=off
This also nicely mimics how environment variables are used today for, for example, authentication: the level of abstraction that is concerned with it controls which environment variable is used.
The other nice part about this proposal is that the behavior users have come to rely on--that
TF_LOG
is the ultimate arbiter--holds true. If a user setsTF_LOG
orTF_LOG_PROVIDER
toWARN
, it doesn't matter what the provider logger is configured to send, Terraform will still only showWARN
andERROR
messages. These new environment variables can only further limit the information being exposed, they can never expand it. And users (and provider developers) don't need to know anything about them to retain the behavior they're getting today, it's just an extra tool for developers that need that enhanced complexity.Logs as Artifacts
Provider developers often need to debug issues that happen on systems they don't have access to, with configurations and states they don't have access to. The available information is limited, and largely provider developers need to rely on logs and reproductions to understand and get to the bottom of issues. As such, the experience of capturing a log to pass to a provider developer needs to be simple and straightforward, and the experience of working with a captured log should offer as much power as debugging a provider running on your own system.
We propose to add
JSON
as an acceptable value forTF_LOG
. When set, it would capture any and all information the provider sends (effectively settingTF_LOG=trace
) but also would output the log lines in their JSON, machine-readable format. The recommendation would then be for provider developers to ask users for theTF_LOG=JSON
output, not theTF_LOG=trace
output.We could then provide a binary tool that could consume that JSON output, filter it using the same tools and environment variables as if the log lines were being generated instead of read, and output a new human-readable version of the log file filtered to the provider developer's specifications.
Design and implementation of this tooling is a future project and not considered part of shipping this improvement. The important part is that this functionality is available when the tool is ready, and until then bug reports are not less useful.
Future Directions
A number of possibilities were discussed as we researched this proposal, and while they don't fit into the current scope of the proposal, they are things we have discussed and would like to implement when the prerequisites are met or we have the bandwidth to implement them productively.
Log Filtering
A lot of people are very interested in keeping sensitive information out of log files. As part of creating structured logging and taking more control over how logging happens, we gain the ability to do some more log filtering. The idea of a function call like
tflog.RegisterFilter(ctx, func(msg string, args []interface{}) (string, []interface{}))
was proposed, which would allow provider developers to register a callback that would be called before each log line was output and modify the message or string of the line. This would allow providers to register callbacks that did things like remove API keys, resource-specific passwords, and other sensitive information from provider log output automatically. It would not impact the log output of core, but would help to address the concerns about potentially exposing sensitive information in logs.We are still interested in doing this, but given the current period of flux the SDK and Terraform provider development in general is undergoing, we wanted to introduce this in a thoughtful and ergonomic way, and didn't want to block the other improvements in this project on that implementation.
Open Telemetry
We're very interested in integrating with OpenTelemetry and considered building
terraform-plugin-log
on top of theopentelemetry-go
library, or just using theopentelemetry-go
library itself. Unfortunately, theopentelemetry-go
library does not yet support logging. As the project develops, we're interested in keeping an eye on it and seeing if we can integrate better with existing standards.Next Steps
Now that we've published this proposal, we're seeking input, comments, and feedback for our ecosystem and community. Would this help you? Are there things you need to do that we didn't take into consideration?
We're going to leave this issue open to collect comments until the end of February. On March 1st, we will close this issue. At that point, this will graduate to an internal HashiCorp RFC. At that point, the rest of HashiCorp will have a chance to offer their feedback and expertise, which may change the details of the proposal as we incorporate new information.
Once the RFC process is complete, we will begin implementing and integrating this work into our existing and future codebases, allowing provider developers to start taking advantage of it.
The text was updated successfully, but these errors were encountered: