-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Terraform downloads module from git source for every usage #11435
Comments
@mattlqx This has been a pain point for a while, especially when using modules a large number of times. I've looked into it in the past but I'm not sure of the best way to change it, so haven't proposed anything. The code behind it is these lines: https://github.com/hashicorp/terraform/blob/master/config/module/tree.go#L156-L177 When the tree is built it uses the path to the module in the tree, and the src of the module to create a key for each module. This is then passed to go-getter which creates the hash for the modules location on disk. The addition of the path was added in #1418 A work around has been proposed using an external file: http://bensnape.com/2016/01/14/terraform-design-patterns-the-terrafile/ I'd like to work out a nicer way, module downloads are the slowest part of running terraform for us at the moment. |
@mitchellh @phinze are there any further thoughts on the how the module lock idea mentioned in #1418 could work? |
This takes about 2 mins to fetch 30 identical modules as opposed to 2 seconds when source is local. Unfortunately this is a showstopper for us. |
any update on this ? |
also hitting this issue |
any update on this ? |
Also an issue for us. And downloading in parallel would be great as well. |
As a step towards a better solution, what if the local module directory name was a hash of the module source and version? Downloads would still be serial but subsequent downloads for identical sources and versions would be effectively no-ops for some module protocols. For git sources you could ignore the path inside the repo too (everything after the double slash), allowing for more more modules to share the same repo clone. Thoughts? Due to module invocations not supporting |
Hi all! In the next major release, due to switching to a new implementation of the configuration language, we also have a new module installer mechanism. Since the new configuration language implementation is able to produce better source location context in error messages, we've switched to using human-readable names for the module installation directories instead of the current hashes, so that paths shown in the error messages are easier to understand. As a consequence of this, the directories are named after the module names given in configuration (because these are short, filename-friendly, and easily recognizable), and so relying on naming the directories after the source address isn't an option any longer. However, we do still want to solve this inefficiency. Our favored idea right now is to make the installer internally keep a mapping table from a (source address, version) tuple to the first module installed from that source, and then if we find a second request to install the same thing we would either copy or symlink the initial directory locally, without downloading again. As @rifelpet suggested, we may also be able to use a "trimmed" version of the source address in this lookup table so that different paths under the same repository/package would be able to share a directory. Since the configuration language improvements are already a big change we decided to leave this optimization out of scope for the first pass. The new implementation was designed with this thought in mind and so it hopefully won't be too hard to retrofit it once the dust has settled on the other changes. |
are there any workarounds? this is a real pain point that's been going on for years. it's gotten a lot worse as the scaling of module reuse grows. |
I guess the trimmed version of the source address works fine, as long as it somehow includes the refspec. For users who are pulling modules from Git repos, If it doesn't include the refspec, some of us are going to have to hold off on upgrading until this optimization can take place. (Or I guess we could build deploy-tarballs-to-S3 pipelines...) |
What do you mean trimmed version of the source address works fine with a refspec? |
The "trimming" function I was alluding to would remove the portion of the address that represents a path within whatever package is being retrieved, which for git repositories would be the path into a sub-directory within the repository. As you point out, @rismoney, that would need to be done with care to preserve all of the query-string-like arguments that many of the source address schemes support, like For example:
The handling of these sub-paths is pretty complex today due to how the different parts of the installer interact. I believe the new installer I mentioned is implemented in a simpler way where the source addresses are handled in a single place, so it'll hopefully be easier to support this trimming-and-caching idea once we've finalized that. The new installer will be in the next major release, so hopefully we can do something with this after that. |
@apparentlymart thank you for the explanation, its now very clear. I had been reusing the same module ~150x within the same repo (using pathing). I was attempting a refactor to move all these modules to an external, and this problem got ugly fast. It became a millions of files (git) thing, due to cloning depth, sheer quantity, and obvious slowness. The question I have with the source being trimmed, is then, is the module name then pushed to another parameter? How do you then specify the proper module. |
nm. it's trimmed from a code perspective, not an enduser perspective. got it. |
Hi all, A while back over in #18309 we merged a change into the v0.12 development branch to re-use already-downloaded modules where possible. Although we had initially planned to leave that out of the v0.12.0 scope, once the new installer was implemented it turned out to be relatively simple to include an initial pass of this optimization behavior due to the new installer being designed with it in mind. That change is now in master ready to be included in the v0.12.0 final release, so I just verified it against the v0.12.0-alpha2 prerelease build. I used this configuration: module "consul-a" {
source = "hashicorp/consul/aws"
}
module "consul-b" {
source = "hashicorp/consul/aws"
} I'm using Terraform module registry sources here first of all just because it's a more convenient thing to quickly test. The caching mechanism works for modules downloaded from any source.
The download completed successfully as before, with very similar output as before (the slight changes in layout here are a result of the other module installer changes in this release). The primary difference here is that while there was a noticeable pause at The trace log explains why this is:
Terraform noticed that both modules had the same finally resolved module source string (in this case it was I also verified this using
For this initial pass we're only optimizing away the redundant downloads. Terraform will still create a local copy of the code for each module, since this allowed this to be implemented in a reasonably low-risk way that doesn't require changes anywhere else in the codebase. The fact that it is implemented as a detail of the installer for now is also why the Although not directly related to the issue at hand, you can also see in the output I shared above the other change I mentioned where the modules are now installed into directories named after the module names given in configuration, rather than the opaque hashes used before. The primary motivation for this is to avoid including inscrutable paths in Terraform's new, richer error messages, which will include references to particular source files where possible. I expect we will make further refinements to this behavior in later releases, such as having Terraform produce symlinks instead of copies where possible, and perhaps making the caching behavior more visible in the UI, but this first pass here meets the main request of this issue and so I'm going to close this out. Once we have more experience with this new implementation after release, we can discuss further improvements to it in new GitHub issues. Thanks for reporting this, and thanks for your patience while we laid the groundwork to make this possible. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
Terraform Version
0.8.4
Affected Resource(s)
Terraform Configuration Files
Expected Behavior
I would think that TF would be able to identify that the module source is the same on all these resources and be able to download the module source once and apply it to each usage rather than downloading it N times.
Actual Behavior
Output appears to show the module is downloaded N times instead of once.
Steps to Reproduce
Create several modules that use the same source and run a TF plan or apply operation.
The text was updated successfully, but these errors were encountered: