Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

addrs.ModuleSource string format doesn't properly handle subdirectories with query parameters #31250

Closed
mkielar opened this issue Jun 15, 2022 · 12 comments
Labels
bug config confirmed a Terraform Core team member has reproduced this issue explained a Terraform Core team member has described the root cause of this issue in code good first issue v1.2 Issues (primarily bugs) reported against v1.2 releases

Comments

@mkielar
Copy link

mkielar commented Jun 15, 2022

Terraform Version

Terraform v1.2.2
on linux_amd64
+ provider registry.terraform.io/hashicorp/aws v4.18.0
+ provider registry.terraform.io/hashicorp/random v3.3.1
+ provider registry.terraform.io/mongey/kafka v0.5.1
+ provider registry.terraform.io/scottwinkler/shell v1.7.10

Git version

git version 2.25.4

Terraform Configuration Files

  1. This works, but breaks autocomplete in IntelliJ IDEA Terraform Plugin
module "configuration_bucket" {
  source = "git::https://dev.azure.com/[MASKED]/[MASKED]/_git/infra-aws-modules//s3/bucket?ref=master"
}
  1. This makes the IntelliJ IDEA Terraform Plugin "see" the module after it's downloaded with configuration from pt.1, but makes terraform init / terraform get fail on fresh workspace.
module "configuration_bucket" {
  source = "git::https://dev.azure.com/[MASKED]/[MASKED]/_git/infra-aws-modules?ref=master//s3/bucket"
}

Debug Output

N/A / See "Actual Behavior"

Expected Behavior

Depending on the solution decided, I'd expect one of the following:

  1. Either, the Source values in .terraform/modules.json should not get modified and should have the same values as source attributes in HCL.
  2. Or, terraform get / terraform init should be able to download modules using the URL syntax stored in .terraform/modules.json / Source fields
  3. Or, the IntelliJ IDEA Terraform Plugin should get smarter (which makes this ticket irrelevant, but I'm still reporting it, because I believe the Plugin's faulty behaviour results from Terraform messing up the source URLs).

Actual Behavior

We have a private Git repository where we store a bunch of terraform modules for internal use. We then refer to them in HCL using git::https://.... syntax, and use both ref and /path/to/module. We use IntelliJ IDEA with Terraform Plugin installed to work on this.

We've noticed that the modules fetched by terraform init / terraform get do not get autocomplete working in Terraform Plugin.
Turns our this is because how terraform changes the URLs provided in the source when it stores module references in .terraform/modules.json. In the case of the configuration presented above, the revelant section of the .terraform/modules.tf looks like this (formatted for clarity):

    {
      "Key": "configuration_bucket",
      "Source": "git::https://dev.azure.com/[MASKED]/[MASKED]/_git/infra-aws-modules?ref=master//s3/bucket",
      "Dir": ".terraform/modules/configuration_bucket/s3/bucket"
    }

You can see how the Source in that JSON is not the same as the source in HCL - the ?ref=master section of the URL has been moved inside the URL. There are several issues with this:

  1. This behaviour breaks the Terraform Plugin for IntelliJ IDEA (and perhaps other IDEs as well, I did not check that).
  2. The modified URL stored in .terraform/modules.json / Source fields is not syntactically correct.

To experiment with pt. 2, I modified the source attribute of the module in my HCL to match the .terraform/modules.json / Source field:

module "configuration_bucket" {
  source = "git::https://dev.azure.com/[MASKED]/[MASKED]/_git/infra-aws-modules?ref=master//s3/bucket"
}

When I did this, two things happened:

  1. The IntelliJ IDEA Terraform Plugin, immediately saw the module and was able to provide autocomplete features.
  2. terraform init / terraform get immediatelly stopped working, throwing following errors:
Error: Failed to download module

Could not download module "configuration_bucket" (../../../../modules/[MASKED]/s3.tf:1) source code from
"git::https://dev.azure.com/[MASKED]/[MASKED]/_git/infra-aws-modules?ref=master//s3/bucket": error downloading
'https://dev.azure.com/[MASKED]/[MASKED]/_git/infra-aws-modules?ref=master%2F%2Fs3%2Fbucket': /usr/bin/git exited with 1: error: pathspec 'master//s3/bucket'
did not match any file(s) known to git

Looks like, when presented with an URL of its own making, Terraform is not able to fetch the module any more, because the git is unable to understand where the ref parameter is ending, and where the path in the repository begins.

Steps to Reproduce

  1. Configure terraform template with 1st example from "Terraform Configuration"
  2. Run terraform get
  3. Try autocomplete in IntelliJ IDEA and see it's not working
  4. Inspect the .terraform/modules.json and see the value in Source for this module is different to the value of source in your HCL
  5. Modify the value of source attribute in your HCL to the value of Source from .terraform/modules.json
  6. Delete .terraform to force terraform to reinitialize workspace
  7. Run terraform get again
  8. See how terraform get fails because Git doesn't understand the new URL.

References

@mkielar mkielar added bug new new issue not yet triaged labels Jun 15, 2022
@jbardin
Copy link
Member

jbardin commented Jun 15, 2022

Hi @mkielar,

Thanks for filing the issue. I don't think there's much we can do here to resolve the problem with IntelliJ, Terraform needs the values stored in the internal modules manifest structured the way they are for a number of reasons. Nothing within the .terraform/ directory is a public interface though, and it regularly changes as the tools evolve, so unless there's a bug in Terraform itself the storage mechanisms are unlikely to change.

Also, I think the fact that IntelliJ was able to use ?ref=master//s3/bucket is indicative of some bugs in their module discovery process, or at least they are drastically deviating from the process which Terraform uses to fetch urls. That subdir component there would be part of the query string, and encoded as ?ref=master%2F%2Fs3%2Fbucket, not matching any remote reference which is why terraform init fails.

If the IntelliJ team wants to engage directly for advice, either with us or the Terraform Language Server team, they are more than welcome!

Thanks

@jbardin jbardin closed this as completed Jun 15, 2022
@jbardin
Copy link
Member

jbardin commented Jun 15, 2022

Sorry @mkielar, I didn't quite realize that the format which seems to work with IntelliJ was what was being stored in the module manifest. That does seem to be an irregularity we'd like to resolve for consistency, so that nothing is tripped up by it in the future. Since the string stored in the manifest isn't something Terraform tries to reverse, it's not going to cause a problem with the normal init process.

One thing to still consider here is that changing the string format will change the keys in the manifest, requiring existing working directories to re-init. This may not be a problem, but could be something to consider when making this change in a patch vs a minor release.

Thanks again!

@jbardin jbardin reopened this Jun 15, 2022
@jbardin jbardin changed the title Terraform "init" / "get" messes git+http module sources addrs.ModuleSource string format doesn't properly handle subdirectories with query parameters Jun 15, 2022
@jbardin jbardin added config confirmed a Terraform Core team member has reproduced this issue and removed new new issue not yet triaged labels Jun 15, 2022
@apparentlymart
Copy link
Contributor

apparentlymart commented Jun 15, 2022

I don't think modern Terraform actually makes use of that source address information saved in the hidden module manifest, since these days we use a directory name derived from the module call path instead, and the other information saved in the JSON is sufficient for that.

So this information appears to be vestigial as far as Terraform is concerned and has apparently regressed at some point.

However, I suspect this arose from the increased use of our address normalisation logic in more places, in an effort to give better feedback to the user about how Terraform is understanding the addresses, and so my suspicion is that this incorrect normalisation is probably also appearing in the UI in some cases, such as in the progress reports from terraform init.

Either way it does seem like a bug, since Terraform is producing a misleading normalized form that reinterprets the subdirectory path as a part of the ref argument, so we should fix that.

I do want to note that this modules.json file is not intended as an integration point and is subject to change in future versions of Terraform. Indeed, it already did change several times in Terraform's life, and will likely change at least one more time as we look into bringing module dependency management to a similar design as provider dependency management. IntelliJ IDEA should not really be relying on that metadata, but the question of what it should be relying on instead (possibly a new use-case to think about and design for) is best saved for another issue, since I expect that for now we can aim to just fix the address normalisation logic alone without any changes to the remainder of this internal cache file.

@apparentlymart
Copy link
Contributor

I had some time to play with this a little directly, rather than guessing from memory, and I just wanted to correct my previous statement a little:

The terraform init UI is intentionally showing the module package addresses -- that is, the address with the subdirectory path removed -- because module packages are the atomic unit of downloading. Terraform only considers the subdirectory portion after the package is already downloaded in order to decide where to look in the package for this specific module's source code.

So with that said, I think this problem is likely more isolated to the hidden module manifest cache than I previously asserted. We don't typically need to reproduce module source strings from the normalized in-memory form in typical processing, and so it might be that the manifest file is the only place that actually needs to do it in practice. Of course that doesn't mean we shouldn't still fix it, but it's a relief that the problem seems to be more isolated than I originally expected.

@apparentlymart
Copy link
Contributor

apparentlymart commented Jun 15, 2022

Here's a reproduction case that uses a repository that's available publicly, just in case it's helpful to anyone working on this:

module "consul_consul-client-security-group-rules" {
  # (this is the git repository underneath the "hashicorp/consul/aws" registry module)
  source  = "git::https://github.com/hashicorp/terraform-aws-consul//modules/consul-client-security-group-rules?ref=v0.11.0"
}

When I run terraform init, as I mentioned in my previous comment it announces installing the package as a whole (the repository, in this case), removing the subdirectory part and then re-attaching it to the local filesystem path at the end, as expected:

Initializing modules...
Downloading git::https://github.com/hashicorp/terraform-aws-consul?ref=v0.11.0 for consul_consul-client-security-group-rules...
- consul_consul-client-security-group-rules in .terraform/modules/consul_consul-client-security-group-rules/modules/consul-client-security-group-rules

However, the Source string saved in the .terraform/modules/modules.json manifest file contains the result of stitching the package address back to the subdir path in an incorrect way that doesn't respect the query string:

{
  "Modules": [
    {
      "Key": "",
      "Source": "",
      "Dir": "."
    },
    {
      "Key": "consul_consul-client-security-group-rules",
      "Source": "git::https://github.com/hashicorp/terraform-aws-consul?ref=v0.11.0//modules/consul-client-security-group-rules",
      "Dir": ".terraform/modules/consul_consul-client-security-group-rules/modules/consul-client-security-group-rules"
    }
  ]
}

The "Source" property in the second object here should instead have had the following value:

"git::https://github.com/hashicorp/terraform-aws-consul//modules/consul-client-security-group-rules?ref=v0.11.0"

@apparentlymart
Copy link
Contributor

It seems like the problem here is that the String method of the addrs.ModuleSourceRemote type is naive and just concatenates the subdir on the end without checking first for a query string:

func (s ModuleSourceRemote) String() string {
if s.Subdir != "" {
return s.PackageAddr.String() + "//" + s.Subdir
}
return s.PackageAddr.String()
}

This is effectively the same as the String method of addrs.ModuleSourceRegistry, but that's fine for a registry address because those can't have query strings anyway. addrs.ModuleSourceRemote.String will need to be a bit more complex to detect when the PackageAddr field has a query string and insert the subdir part in surgically before the query string in order to get the correct result.

addrs.ModuleSourceRegistry.PackageAddr is an addrs.ModulePackage, which is really just a string with some extra assumptions attached -- specifically, that it's the result of go-getter's "detect" algorithm. I believe, but haven't yet confirmed, that it should be safe to assume that the first ? symbol in that string marks the start of the query string, because go-getter itself would've used a URL parser on this when doing its "detection" and thus this string should always be a valid go-getter-flavored "pseudo-URL" (go-getter also includes a method:: prefix on the front to track its decision about which installation method to use).

@apparentlymart apparentlymart added explained a Terraform Core team member has described the root cause of this issue in code good first issue labels Jun 15, 2022
@deepto98
Copy link

Can I pick this up? Can you please point me to the changes required and the files I need to look at?

@apparentlymart
Copy link
Contributor

Hi @deepto98! Thanks for the interest in working on this.

My previous two comments here were my attempt to document the expected behavior and the area of code that seems to be the cause. If you have specific questions about those details, please let me know!

@deepto98
Copy link

Okay, thanks. I'll go through them.

@fatahfattah
Copy link
Contributor

@apparentlymart I guess this can be closed due to #31636

@apparentlymart apparentlymart added the v1.2 Issues (primarily bugs) reported against v1.2 releases label Sep 16, 2022
@apparentlymart
Copy link
Contributor

Hi @fatahfattah!

Indeed, I did forget to close this after merging your PR. I'm going to close it now.

I merged your fix into the main branch so it should be included in the v1.4 release once we get there. We've only just started the v1.4 development period so it will be some time before it's available in a final release, but you should be able to see it in the 1.4.0 alpha releases once we start producing them (which typically happens only after there's something interesting in the main branch that we'd like early feedback on.)

@github-actions
Copy link
Contributor

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug config confirmed a Terraform Core team member has reproduced this issue explained a Terraform Core team member has described the root cause of this issue in code good first issue v1.2 Issues (primarily bugs) reported against v1.2 releases
Projects
None yet
Development

No branches or pull requests

5 participants