-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype: Opt-in additional hints during validation #29661
Conversation
One aspect of the existing
I understand that this is necessary if we are to keep some validation logic on the provider side, but it is problematic enough that we don't run If you don't anticipate the linting to require providers I would vote for a separate |
Unfortunately a lot of things we'd want to do with linting will require access to the provider schemas anyway -- we basically can't do anything with the bodies of I think if we do try to solve for linting without that overhead then it'll probably also lead to us doing the work to enable a partial validate that ignores the stuff that we call into providers to get, though I'm unsure whether such a result would actually be that useful since most of the interesting stuff in a Terraform module is content Terraform Core passes largely verbatim onto providers anyway. It sounds like we ought to have a separate effort somewhere to optimize the integration point between the language server and Terraform CLI for validation purposes. I expect we can optimize that path to a fair extent if we prioritize profiling it and are then willing to change some fundamentals of how that all works, such as letting Terraform CLI stay running so it can cache schemas in memory, and perhaps changing the provider protocol to reduce the number of gRPC round-trips it takes to validate, assuming that profiling shows those two as the main overhead. (I've not measured it, so that's just a guess based on what you reported here.) Doing that should hopefully benefit both validation and linting, since they are both built on the same basic structures internally and vary only in some extra CPU-bound logic for the lint rules. |
That's fair - I just wanted to raise this early as I imagine many people will want to run this from an editor and will likely run into the same set of problems I did with the existing
I admit I have mostly anecdotal evidence + my own experience at the moment, but I do plan to gather some more data in this area, so I should have something more concrete for you soon. Back to the original proposal: I'm sure there's an issue tracking it somewhere, but I remember we had some conversations about the possibility of providers doing some "heavier" API-based validation, such as "is this name available for an S3 bucket", or "is this EC2 instance type available for me in this AWS account"? I imagine if we were to implement it somehow, such validation would also need to be separate-able from the other - more static and faster - validation. I'm in no way suggesting you to tackle this problem here in this PR, but just sharing some thoughts on possible direction of future expansion of |
Currently our main distinction between To me, the plan phase is already the correct place for checks like the ones you mentioned -- the remaining checks that get excluded from I think so far this has mainly come down to a combination of opportunity cost (we get more impact from working on the constant stream of new functionality the cloud vendors are releasing than adding and maintaining dynamic validation checks that most folks will not hit) and SDK design quirks (the way the current SDK models planning is pretty far removed from how Terraform itself thinks about planning). The second of these at least we have an opportunity to address in the new framework. The opportunity cost is harder to overcome, but hopefully at least that cost will be lower once we have an SDK designed with that functionality in mind, rather than retrofitted late in its life. With all of that said, I do appreciate you raising these ideas! I do think they are worth exploring, I'm just seeing them as something separate from the idea of linting, which I think of as a partial solution to the common problem of folks writing things that are technically valid but don't mean what they expected, often because of pragmatic compromises in our language design inherited from earlier Terraform versions. |
This follows a similar principle as Context.Plan and Context.Eval. Although we don't have any special extra options to add right now, adding a new argument to the Validate method causes disruptive changes to callers in various other files, whereas adding a new optional member to ValidateOpts will typically touch fewer files and thus allow more of our code to stay unmodified (and thus more backport-friendly) across future updates.
This is another small step in our ongoing effort to refine the exported API of the "terraform" package to be only what external callers need. The validate graph is an implementation detail of Context.Validate and need not be exposed outside of this package.
The graph builder doesn't actually use this information yet, but this will allow us to make the graph builder react to future options without causing any further changes in the validateGraphBuilder callers.
Here we establish a new mode for Context.Validate which enables some extra checks which can produce warnings that don't necessarily need to be fixed in order for a Terraform module to be accepted, but that which are indicative of common mistakes that might lead to unintended behavior or of non-idiomatic style. The idea of this mode is that users could potentially enable it if they are seeing some unexpected behavior, to potentially get some additional hints as to what might be causing it. Since linting might sometimes generate false negatives, this mode probably won't be suitable for use as a pre-merge automated check. For this first round, the lint checks are focused on the depends_on arguments within "resource" and "data" blocks, catching situations where entries in depends_on are redundant with other declarations or refer to individual resource instances when Terraform only considers whole resources. This new mode is currently not exposed in the UI anywhere, so this is all just dead code for the moment. Perhaps in future it will be exposed either as a new mode for "terraform validate" or as a new command "terraform lint", but we'll save that decision for later work.
528f23d
to
cd747a5
Compare
This is here to support a forthcoming "hint mode" for validation, which opts in to generating additional hints that will hopefully help with debugging and improving module understandability but do not imply that a module is necessarily "incorrect" or need to be fixed in all cases. Not all of our codepaths will be able to handle this severity well, because it's arriving on the scene late and so some existing codepaths won't be aware of it, but that's okay because it's constrained only to the validation codepath in particular, and only when hint mode is enabled, and so we only really need to deal with it in that particular situation.
The noun "hint" seems to better describe what this mode actually does, which is to make suggestions that hopefully improve readability overall but may not be appropriate in all situations. This retains the previous suggestions about depends_on and doesn't introduce any others, but they are now presented as hint diagnostics instead of warning diagnostics.
4cbae5f
to
768f62a
Compare
With a particular compile-time flag set (not by default) this adds an extra -hints option to "terraform validate" which makes it run Terraform Core's validate operation in hints mode. For the moment this is just here to let us easily experiment with hints in development environments. It's not yet ready to graduate to being included in real releases because we still need to get a sense of what the "rules" are for what makes sense as a hint, and we might also decide to use a different CLI argument style or other interface to access the hints mode.
I've done a bit more fiddling with this today, since I had some fragmented spare time between meetings/discussions. I ended up renaming "lint" to "hints" because that seemed more consistent with the goals I outlined here, of just offering some suggestions for the author to evaluate rather than intending this as a checklist of things that must be fixed in order for a module to be considered "good". That then led to a new presentation of the results where they get annotated explicitly as being "Hint:" rather than "Warning:", and thus they are more clearly distinct from any error and warning messages that we might produce alongside.
|
This PR has now become so conflicted as to be useless. 😖 |
I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions. |
This is an initial prototype of a new "hint mode", which extends the
Validate
operation with some additional checks which can generate warnings that might not always be applicable, and thus would not necessarily be appropriate to use as a blocker for merging a PR, or similar.The idea here would be to raise situations that aren't technically invalid but that are either non-idiomatic or a common cause of unexpected/undesirable behavior. The main use-case I have in mind for this is that if Terraform is doing something surprising and you don't understand why then you might ask it for hints on your configuration to see if it generates any additional notes that might either give something new to investigate or allow eliminating certain possible causes from consideration (by removing redundant declarations in order to focus on the significant parts).
As a starting point, this includes a set of checks relating to the
depends_on
meta-argument withinresource
anddata
blocks, detecting situations where the declarations in there are either redundant with other references or over-specified. I started with these warnings because redundant declarations seem to have been a common source of distraction for people attempting to debug, where they waste time trying to re-state dependencies that Terraform can already see and can thus find themselves in an XY problem, thinking that dependencies are the problem when there's actually some other cause.Some other possible ideas for lint checks we could potentially include are:
true
or alwaysfalse
, such as comparing two values of different types with either==
or!=
. (Could resolve compare map doesn't work when we use condition #27643, by automatically explaining the situation rather than leaving it ambiguous.)"1" + 2
could warn that the"1"
ought to be written as1
. This is not problematic in itself, but this is a thing in the category of "eliminate odd stuff that is likely to distract from finding a real cause". (I would hesitate to extend this to non-literal operands because relying on implicit conversions for resource attributes and similar is pretty common and so would likely be very noisy without really explaining anything useful.)timestamp
,uuid
, andbcrypt
) in resource arguments, which will typically cause a configuration to never converge because there will always be a planned update/replace for that resource. (Could we detect that indrectly too, so that e.g. referring to a local value which is itself non-converging would also generate the warning?)jsonencode
as a more robust answer, since we commonly see folks trying to generate JSON using string templates and struggling to get all of the punctuation right. The goal of catching what appears to be slightly-invalid JSON makes this heuristic a bit trickier than it would otherwise be.any
), because that tends to help understand why something weird is going on when a caller writes an expression of the wrong type.lookup
in its two-argument form, which has long been superseded by the explicit index syntax.lookup
in its three-argument form, which (subjectively) tends to be more readable as atry
call instead, because the argument then appears just as normal traversal syntax and is thus easier to read.element
with a list of a known length and a constant index that we can prove is in bounds for the list. Better to use index syntax instead, because such a call can't possibly be relying on the wrap-around effect ofelement
.aws_instance.example.*.id[0]
, and propose writingaws_instance.example[0].id
instead to make the expression easier to understand when debugging.file
,templatefile
,filebase64
, etc refers to a file that doesn't exist and seems similar to a filename in a location that seems like it's intending to generate a file during the apply operation, such as in alocal-exec
provisioner for a resource that this object depends on? This one might be too fuzzy to detect reliably, but if we could catch it then we could give more explicit direction that these functions aren't suitable for that purpose."var.example"
would, if there's also avariable "example"
block in the module, suggest writing plainvar.example
instead.[*]
to a resource which hasfor_each
set. Although it's technically okay to do that, because the splat operator can apply to values of any type, forfor_each
resources in particular it's much more likely to be an incorrect attempt to mimic selection of multiple attributes on acount
resource. This lint rule could suggest wrapping the resource expression itself invalues(...)
in order to get what was probably the intended effect, likevalues(aws_instance.example)[*].private_ip
.I'm focusing on generic language-level lints here, rather than provider-specific lints. Provider-specific lints could be interesting too, but we'd need to figure out how to represent that in the provider wire protocol and so wouldn't want to block on that.
I've not wired this new mode up to the UI in this PR, because there are a few different ways we could expose these with some different tradeoffs, while the main point of this PR was to prototype how lint checking might integrate into Terraform Core, not how we might expose it in the UI. I'm considering two main possibilities for exposing this in the UI, though we might consider others too:
terraform validate -lint
terraform lint
In both of these cases it seems like we might want to add a new diagnostic severity to represent lint warnings, so that readers of the output can differentiate between the warnings that they probably should pay close attention to vs. the lint warnings that might not necessarily be relevant (but hopefully are more often than not). For the moment I've just classified these as normal warnings, because I wanted to keep these changes mostly confined to the main Terraform Core package.
It might also be nice to include "fix it hints" which can be machine-understood and automatically applied to the source code, e.g. in a text editor. That won't apply to all lints, but can apply to some. I wouldn't expect to do that in a first pass -- it would require extending our diagnostics model in some way -- but would be nice to do in the long run.
The three resource-dependency-related example lint warnings that this PR introduces to start with: