-
-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run terraform init
as part of check
goal
#18974
Conversation
the dumbest possible way of fetching providers/deps
Hey @lilatomic - thanks for handling this issue, I was about to start looking into it myself and saw you had already done so. 😄 Could you also run the script to take care of this issue too? I tried pulling in your changes to use with terraform 1.2.9 on an Intel Mac and am getting the following error:
|
There's a conversation here about why the tf backend doesn't run Can you explain here how this change handles that issue? |
|
||
@dataclass(frozen=True) | ||
class TerraformDependencies: | ||
fetched_deps: Tuple[Tuple[str, Digest], ...] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is (dir, Digest of .terraform and .terraform.lock.hcl for that dir)
? Worth documenting, as that is not at all intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, are these files in fact relocatable/cacheable? Are they remote-cacheable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they're definitely cacheable. My concern is on the grain of the cache and the cache key. Currently I think they'd all be considered 1 big digest, and the cache key would be all TF files requested. Both can be made smaller. I talk a bit more about it in the comment at the head of the MR. We can extract the list of required providers and individually download them (we can then either have terraform install from these local sources or emplace them ourselves). Terraform also supports lockfiles, which we could in some way leverage.
I'm pretty sure they could be remote-cached as well.
The combined size can be quite large, ours are around 200M (I think the big cloud Providers are most of it)
TerraformDependencies, | ||
GetTerraformDependenciesRequest(source_files, tuple(files_by_directory.keys())), | ||
) | ||
# just merge them all for now. This will probably be a problem with multiple TF sources requesting different versions of the same providers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that a terraform
-level problem though? I.e., wouldn't that be an issue even without Pants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pants might introduce another way for it to be a problem, in that it looks like it selects multiple terraform_modules and includes them all in the same (single) partition. Essentially the same as if Python didn't have the concept of resolves: it would be possible for different Python sources to depend on different versions of dependencies.
I think terraform does the right thing here because we use the -chdir
argument, so even though all the sources are included terraform is only looking in the correct subdir. I'm not sure though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. I suppose we could run tf init
on each directory in a separate process? But that can be a followup investigation.
args=("init",), | ||
input_digest=req.source_files.snapshot.digest, | ||
output_files=(".terraform.lock.hcl",), | ||
output_directories=(".terraform",), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
terraform
will write these relative to the chdir
, but Pants doesn't know about chdir
so won't it try to capture them relative to the sandbox root, and then not find them? So don't you need to prefix the directory
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're prefixed inside of TerraformProcess. The logic for me was that terraform interprets all of the other file paths as relative to the chdir. Your comment reminded me that there's a better way than banging it together with pathlib, I'll look into that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, makes sense.
I think
This MR gets around each of those by:
I think it's therefore safe to run |
Should I merge this, @lilatomic ? |
yeah, should be good to go. I was thinking of the "relocated_files" target, which doesn't help here. Hopefully I'll be able to put in the work for making the logic of this process smarter |
Many Terraform tools require Providers and Modules to be installed with
terraform init
for them to work properly. This includesterraform validate
which is the backbone of the currentcheck
goal for Terraform.This is the simplest way of installing these. It makes no effort to actually understand what modules are used, and defers entirely to the
terraform init
. As the providers will be downloaded as one block, this might take up a lot of space in the pants cache (I'm not sure tho).We could do smarter things. Terraform has separate was of listing the required providers; saving lockfiles of them; and downloading the providers. (We could also parse the version specs from the Terraform sources themselves and make lockfiles ourselves.) That sounds like a cool thing to do, but definitely more complicated than bundling the dependencies into a Digest. I think it's worth getting this support working since
check
is not functional for any Terraform sources which use Providers or Modules, which is probably most of them. This will also allow us to integrate more Terraform tools. I think the improvements to Provider fetching are unlikely to resolve real problems. Terraform deployments usually pull in all their sibling files, contrasting with Python where siblings might not be imported; therefore, we are unlikely to see large gains in separating the downloaded Providers.resolves #18489