Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

configurable output format for yamlencode #23322

Closed
hassenius opened this issue Nov 8, 2019 · 35 comments
Closed

configurable output format for yamlencode #23322

hassenius opened this issue Nov 8, 2019 · 35 comments

Comments

@hassenius
Copy link

Current Terraform Version

Terraform v0.12.13
+ provider.null v2.1.2

Use-cases

I want to use terraform to generate yaml formatted configuration files for an ansible based installation.

Attempted Solutions

Currently we dump a jsonencode(${var.some_map_var}) file to the target system, and then use remote-exec that runs a Python script that parses the .json file to generate the desired config.yaml

With a map of

some_map_var = {
  foo = ["bar", "baz"]
  dofoo = true
}

This will generate a nice yaml that Ansible can use, i.e.

foo:
- bar
- baz

dofoo: true

Having discovered the yamlencode function in 0.12 this seems like a really nice option to avoid the escape hatch of the remote-exec python script and stay truer to Terraform native end-to-end.

However, the current yamlencode function seems to produce a file like this

"foo":
- "bar"
- "baz"
"dofoo": true

where all the keys are quoted (I guess because they are strings), rather than giving us a nice UTF-8 unquoted yaml file as we get with our Python parser.
This seems to create some issues for Ansible.

Proposal

Allow (at least a config switch) to generate yaml files what does not quote keys and values

References

@ocervell
Copy link

ocervell commented Dec 7, 2019

+1, I think we should produce some nice looking YAML :)

@rafaelmagu
Copy link

Ran into this problem today. The quotations are causing weird issues with Kubernetes config maps (I have to embed a YAML into a config map key)

@zidz
Copy link

zidz commented Feb 20, 2020

+1. Want to be able to create config maps from terraform maps.

@techdragon
Copy link

This causes problems in a number of environments where downstream applications consume YAML but dislike the "quote everything" + "alphabetical sorted" output of the yamlencode function.

@ocervell
Copy link

Yes, it would be nice to keep the original ordering of fields in the template file + remove the quotes. You could use a beautifier for that within Terraform after converting to YAML.

@avgalani
Copy link

avgalani commented Jul 9, 2020

+1, encountered this issue just now

@tcdev00
Copy link

tcdev00 commented Jul 17, 2020

+1, Also encountered this issue today

@maikelvl
Copy link

maikelvl commented Oct 26, 2020

I have a workaround for this. It's working for me, but beware; It's kinda hacky.

Having this variable:

some_map_var = {
  foo = ["bar", "baz"]
  dofoo = true
}

Wrap it with a regex replace function:

replace(yamlencode(var.some_map_var), "/((?:^|\n)[\\s-]*)\"([\\w-]+)\":/", "$1$2:")

Results in this output:

foo:
- "bar"
- "baz"
dofoo: true

@apparentlymart
Copy link
Contributor

Hi all! Sorry for the slow response here.

I was just reviewing the comments here and it seems like while some of the comments could be considered just a matter of style preference (some folks prefer the unquoted YAML style, which is fair enough), I also see several of you talking about situations where other software has refused to process the yamlencode results.

To summarize I see:

  • "This seems to create some issues for Ansible."
  • "The quotations are causing weird issues with Kubernetes config maps"
  • "a number of environments where downstream applications consume YAML but dislike the 'quote everything' + 'alphabetical sorted' output"

When we first introduced yamlencode we did try to leave some room for making subtle improvements to its output by marking it as experimental, but in practice I think it's more-or-less fixed in place now, because we don't really want to cause churn (potentially involving forced-replacement) for existing callers just for stylistic preferences.

However, I expect we would make some different tradeoffs if it turned out that what yamlencode is producing is invalid in some way, such that it can't be parsed by other valid YAML parsers. If you all can share some more concrete examples of output that yamlencode produces that specific other software won't accept then I'd love to review those in a little more detail and see if we can find a compromise that would help those applications work without creating the broad churn for existing users that I'm worried about. If you have any links to relevant documentation for that software to share alongside those concrete examples that'd be extra helpful, since the Terraform team isn't necessarily intimately familiar with the details of other software.

Since generating YAML is only an ancillary use-case for Terraform and not its primary purpose, I don't expect that we would invest in a highly-configurable yamlencode function: that'd make the function far more complex than originally intended, and it's already pretty complicated. However, I would like to see about adjusting its output so that its single available behavior is more useful by being more compatible with existing software, if we can.

Thanks!


I do want to note that there's a key difference here between a purely stylistic tradeoff like string quoting compared to the functional difference of specifying map keys in a particular order. For the latter, it's not yamlencode that's discarding the ordering but rather the Terraform language itself, because Terraform's map type is unordered.

Dealing with these various little differences between type systems is part of the game when it comes to cross-language serialization formats, so I'd hope that anyone writing a YAML parser would be pragmatic and realize that there are plenty of languages which (like Terraform) don't have order-preserving mapping types.

If not though, unfortunately I don't think we can really help much with that because the original ordering information just isn't there, and often wasn't inherent in the source data in the first place if e.g. the map was constructed dynamically using a function. If you need that level of control, you'd need to use a different strategy to generate YAML mechanically yourself, such as generating it from a template where you can dictate exactly which punctuation, whitespace, and ordering the result would have.

@fitchtech
Copy link

fitchtech commented Sep 4, 2021

@apparentlymart I think the point is the yamlencode function does not produce valid YAML, at all, for anything. No YAML should have the maps, keys, and lists in quotes. That is not the standard anywhere and parsers that encode or lint proper YAML syntax would have a problem with this.

For example, the AWS EKS Terraform module created by AWS uses yamlencode to render data for the aws-auth configMap in Kubernetes. This defines the mapping of AWS IAM accounts, roles, and users to Kubernetes groups and users for access control to the entire cluster.
https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/aws_auth.tf#L81

The YAML data in the configMap should look like this, as shown in their documentation. This is a standard Kubernetes manifest
https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: <ARN of instance role (not instance profile)>
      username: system:node:{{EC2PrivateDNSName}}
      groups:
        - system:bootstrappers
        - system:nodes

However, this is what you get as a result of using the Terraform EKS module that aggregates maps and lists in locals then uses the yamlencode function to render the YAML for the data in the configMap that is created with the Kubernetes provider configMap resource. None of these quotes should have been added, it's not required, and goes against the entire point of YAML being more human-readable. I don't think I've seen any YAML parser that puts everything in quotes like this.

  mapRoles: |
    - "groups":
      - "system:bootstrappers"
      - "system:nodes"
      "rolearn": "arn:aws:iam::{redacted}:role/eks-workers-role"
      "username": "system:node:{{EC2PrivateDNSName}}"
    - "groups":
      - "system:masters"
      "rolearn": "arn:aws:iam::{redacted}:role/AWSReservedSSO_AdministratorAccess"
      "username": "AWSReservedSSO_AdministratorAccess"
    - "groups":
      - "system:bootstrappers"
      - "system:nodes"
      "rolearn": "arn:aws:iam::{redacted}:role/eks-node-role"
      "username": "system:node:{{EC2PrivateDNSName}}"

The yamlencode function is useless without needing to use replace functions to remove them, not ideal and not always possible to do. I might not want to replace all """ with "". The reason being the occasion when you want to change the type to string. Where a boolean type (e.g. key: true ) or maybe you want the value case as a string (e.g. key: "true"), same with numbers as 123456 or cast to string with "123456"

Another example would be to look at any Kubernetes manifest YAML or use the Helm template command to render a chart into manifest YAML. The only time you'd have quotes around the value is for things like numbers that you want to be treated as a string type. You don't even need quotes or escapes when a key name has . or / within them as long as it's before the :

kind: ConfigMap
metadata:
  creationTimestamp: "2021-08-28T04:45:12Z"
  labels:
    eks.amazonaws.com/component: coredns
    k8s-app: kube-dns
  name: coredns
  namespace: kube-system
  resourceVersion: "10831359"

The reason why this is so extremely important and why you do not want this is that, unlike JSON, whitespace matters in YAML The number of spaces, not tabs, and therefore the indentations of maps, lists, and how data would be nested is important for it to be valid in most cases.

@ivan046
Copy link

ivan046 commented Sep 6, 2021

+1 for non-quoted yaml keys and most values

@joshsleeper
Copy link

don't get me wrong, I would 100% prefer a way to only use quoted keys/values when it's required, but...

I think the point is the yamlencode function does not produce valid YAML, at all, for anything. No YAML should have the maps, keys, and lists in quotes. That is not the standard anywhere and parsers that encode or lint proper YAML syntax would have a problem with this.

@fitchtech fyi the YAML 1.1 spec does actually have examples of quoted keys being valid YAML, which they totally are. it's just not super well spelled out imo.

image

https://yaml.org/spec/1.1/

the YAML 1.2 spec has a slightly different example, but demonstrates the same validity of quoted keys.

image

https://yaml.org/spec/1.2/

@fitchtech
Copy link

@joshsleeper while it maybe be valid it does cause issues. Also it does not follow proper YAML styling. Using quotes has a specific meaning in YAML unlike JSON or HCL. For example if I have locals { number = 12345 } that's specifying a number data type. So I would expect the YAML equivalent be..

number: 12345

And not this..

"number": "12345"

That's not what I declared or want as the output. It should be the same data type and only cast to string when set that way.

For example, if it were
locals { number = "12345" }
that's a string and would then expect the YAML encoded output to be..

number: "12345"

It just doesn't make sense to put all the keys, values, and maps in quotes like this. It's not useful in practical application and I always avoid it.

An easier approach with cleaner YAML is to use the templatefile function with a map of maps variable that inserts your YAML blocks within a template file using a string template for each expression. Nesting that within YAML decode in locals then let's you pass it to other blocks easily like the data block of a Kubernetes Config map resource.

@joshsleeper
Copy link

joshsleeper commented Sep 8, 2021

while I agree that arbitrarily quoting numbers and boolean values would be a problem, I'm not seeing such behavior in yamlencode() at this point in time?

# sample.tf
locals {
  test_yamlencode = yamlencode({
    string_key : "string_value"
    simple_number : 123
    complex_number : 1e+3
    123 : 123
    bool_key : false
    map : [
      "map_string", 456, true,
    ]
  })
}

output "test_yamlencode" {
  value = local.test_yamlencode
}
$ terraform plan

Changes to Outputs:
  + test_yamlencode = <<-EOT
        "123": 123
        "bool_key": false
        "complex_number": 1000
        "map":
        - "map_string"
        - 456
        - true
        "simple_number": 123
        "string_key": "string_value"
    EOT

You can apply this plan to save these new output values to the Terraform state, without changing any real infrastructure.

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Note: You didn't use the -out option to save this plan, so Terraform can't guarantee to take exactly these actions if you
run "terraform apply" now.

string, boolean, and number values passed to yamlencode(), both in and outside of a map, all seem to end up with the correct YAML types after encoding, not all arbitrarily quoted and turned into strings as you suggest. you're right though, that would def be an issue if it was doing that!

the only change I'm really seeing is it forcing key quoting (which really should be considered a style thing since all keys act like strings and it's perfectly valid according to the spec) and forcing string value quoting (which again is perfectly valid and often recommended to avoid special characters behaving oddly).

@fitchtech
Copy link

@joshsleeper didn't realize it was not quoting numbers and bool type values at least. Still seems strange that all the other keys and string values are in quotes despite that being unnecessary. IMHO the only times it should be quoted in the YAML is when you want number or bool cast as string, e.g. "12345" or "true"

@apparentlymart
Copy link
Contributor

Thanks for raising the question about the use of quotes, and for the efforts here to uncover whether it represents a practical problem for interoperability with other software.

yamlencode intentionally always uses quoted strings because a significant change between YAML 1.1 and YAML 1.2 was a change to the implicit tagging rules for plain scalars, and in particular YAML 1.1 leaves the interpretation of plain scalars to be defined by the application, rather than directly defining it.

Using quoted strings universally is therefore a compromise that ensures that most other parsers (of both YAML versions) will interpret the value as a string without incurring the high readability cost of writing out explicit type tags. We intend the result to follow the YAML 1.2 core schema while also being unambiguous to a YAML 1.1 parser (as far as possible, given that YAML 1.1 intentionally treats various parsing rules as application-defined).

Based on what we've seen so far, this seems like an example of a style preference rather than an interoperability problem and thus not within the scope of changes we'd consider making to yamlencode.

@tuaris
Copy link

tuaris commented Nov 5, 2021

I have a case where I am using Terraform + SaltStack + Consul. I have SaltStack setup to read pillar information from Consul:

consul_config root=saltstack/private/%(minion_id)s

In Terraform I would write a key named role that assigns a role to an EC2 instance I am provisioning. An instance can have multiple roles:

resource "consul_keys" "master" {
	key {
		path  = "saltstack/private/${aws_route53_record.master.fqdn}/role"
		value = yamlencode(["salt_master", "consul_server", "netdata_server"])
	}
}

This writes the key as follows:

+ key {
          + delete = false
          + flags  = 0
          + path   = "saltstack/private/master.domain.tld/role"
          + value  = <<-EOT
                - "salt_master"
                - "consul_server"
                - "netdata_server"
            EOT
        }

The key's value has the quotes when I look at Consul.

image

However it seems that Saltstack is able to handle it just fine and remove the quotes:

root@master:~# salt '*' pillar.items
master.domain.tld:
    ----------
    role:
        - salt_master
        - consul_server
        - netdata_server

Even though for my use case it seems to work, its odd because normally you wouldn't put quotes around those items if you where defining this in a local YAML file. Most people would probably be thrown off by this behavior (I was initially).

@usernkey
Copy link

usernkey commented Nov 12, 2021

I use this workaround

  set {
     name  = "config"
     value = replace(yamlencode(
       {region:"eu-west-1",
       set_timestamp:"false",
       period_seconds : "240",
       metrics:[ { aws_namespace : "AWS/RDS",
                   aws_metric_name: "ReadLatency",
                   aws_dimensions:"[DBInstanceIdentifier]",
                   aws_dimension_select: "{DBInstanceIdentifier : [db-complete-mysql-444105]}" ,
                   aws_statistics : "[Average]"
                 },
   
                 ]
      }
     ),"\"","" )
   }

@herrbpl
Copy link

herrbpl commented Feb 22, 2022

In addition to things above, this causes configuration drift for Terraform rancher_app_v2 input, which seems to format yaml in a different way and as result, there are always confiuration drifts when using yamlencode output as rancher_app_v2 values input

@apparentlymart
Copy link
Contributor

Hi @herrbpl,

In Terraform's architecture, part of the responsibility of a provider is to include rules to recognize the difference between two values that are materially different -- that is, the meaning has changed -- vs. two values that are just two different ways to write down the same information.

There are already lots of examples of providers handling this for JSON, where remote APIs will often accept JSON as input but store the data internally in some other format, re-serializing it to JSON on read and therefore potentially producing a different serialization.

Although this is the first example I've seen of a system doing this with YAML -- and surprising, because presumably that means it will also discard any comments you included in the input, thus defeating a main benefit of YAML over JSON -- I think the same architectural principle still applies: the Rancher provider ought to have a rule to detect when two values are serializations of the same data and classify that as an immaterial change, to allow the configuration and state to converge.

I'd suggest recording that as a feature request for the provider. Unfortunately since I think this is the first example of doing it for YAML in particular, rather than for e.g. JSON, it'll take some extra up-front work to write a comparison function for YAML, whereas in JSON situations there is one built into the SDK which can handle many simple situations. However, I assume the same principle will apply as for the JSON equivalent: parse both the old and the new to discard the irrelevant syntax details, and then compare them to see if there are any remaining differences beyond just syntax.

@herrbpl
Copy link

herrbpl commented Feb 24, 2022

Thanks for detailed reply. Now that i think of it, I seem to recall rancher_app and app_v2 use string for values input. Even extra line line feed causes drift. I'll post this to their provider tracker.

@nfi-opsguru
Copy link

My 2c: providers should never* deal with YAML directly. There is very rarely a situation where JSON wouldn't be better: you can reasonably normalize JSON for most applications, thereby preventing drift without having to parse it and compare the parsed tree. And JSON is a subset of YAML these days, so all YAML-compliant apps should be able to handle it.

If at any point along the chain anything re-encodes the YAML, you're almost certainly going to lose stylistic information anyway: AFAIK there exists no YAML re-encode process that perfectly preserves stylistic info (all whitespace, all quote styles, all comments). So if your application only deals with the subset of YAML structure that is JSON-compatible, you may as well use JSON because your YAML's going to get mangled anyway. Style-preserving YAML is almost a fundamentally separate type to we-only-care-about-data YAML.

For example, take the helm_release's value field. It accepts a YAML-string. Yet the actual value getting written to the k8s HelmRelease object is a string of JSON!

	Values *apiextensionsv1.JSON `json:"values,omitempty"`

Indeed, when it loads values from state, it uses a YAML Unmarshaller that converts from JSON and therefore cannot preserve anything JSON cannot in state. So why not just store JSON in the state? It can't actually handle non-JSONable YAML, and it's not going to preserve/diff comments in the actual resource. Better yet, instead of making a JSON string, make it a map(any) and make the user decode it.

Anyway, my point is: I'm guessing YAML re-encoding stability is not actually that necessary in practice because no real API actually wants an yamlencode'd string in the first place. Happy to see a solid counterexample though!

(* Exception might be when the output is meant for human consumption and you need to preserve its exact stylistic structure, comments, etc. but I'm hard-pressed to think of an example of that in the Terraform realm.)

@nfi-opsguru
Copy link

One counterexample would be cloud-init. I would argue that you could just store the shebang-style comment and the body separately, then mix them together in YAML for the user when writing to the API.

@apparentlymart
Copy link
Contributor

I agree that it would be weird for a provider to itself be dealing with YAML. I think the main situations for yamlencode are those like the cloud-init example you mentioned, where there is some other system at least two hops away from Terraform that is expecting YAML and the API that the provider is directly interacting with just expects an arbitrary bag of bytes to pass on to that remote system. In that case, it would not be possible for the provider to detect and handle normalization because the content of the bag of bytes is opaque to the provider. But also, it doesn't typically matter because often that system that ultimately uses the YAML doesn't get any opportunity to normalize it in a way that would reflect back in the API, and so the bag of bytes remains verbatim what the author originally submitted.

I do find the Rancher example surprising for this reason, but I'm not familiar enough with Rancher to understand the details of what's going on there. It seems like either the Rancher provider or the Rancher API are directly using the YAML but are reflecting it back in a normalized form, which is pretty unusual as I mentioned above and I've still not encountered another example of such a design.

I'd rather keep discussions about the designs of specific providers in those providers' own issue trackers though, so that their authors (who know far more about the underlying systems than I do) can be the ones to make the necessary tradeoffs. For our purposes with this issue, if a provider has behavior like discussed above where it (or the API it interacts with) accepts YAML and normalizes it then it would be the provider's responsibility to classify that normalization as normalization, so that Terraform will not report it as a meaningful change. Whether the provider should be doing that is a matter for the provider developers to consider for themselves, but the previous situation is one of the consequences they should consider when making that decision.

@nfi-opsguru
Copy link

I'd rather keep discussions about the designs of specific providers in those providers' own issue trackers though, so that their authors

Yep, I was just using helm_release as an example.

I think you're right: it's up to the provider to know its resource API details and avoid drift where there isn't a meaningful change. Namely, it should not be up to the user, via normalization flags to yamlencode or otherwise, to ensure that non-functional drift doesn't occur.

So I'd say the solution to this particular issue is just a clear Terraform policy around that, that users and provider devs can be pointed at when this comes up.

That said, my advice as a provider dev is to never do API calls with raw YAML if it can be avoided.

@tomharrisonjr
Copy link

Hi @apparentlymart --

I am assuming you're affiliated with Hashicorp and Terraform. Thank you for your answers and for your effort here.

The nature of this thread reveals a core truth of Terraform, namely that it is a semantically correct and pure software tool.

There are a multitude of use-cases for producing YAML (and JSON) as these are the primary data interchange mechanisms used by modern software. While you assert that it's not a primary function of the software, that cannot really be true, as a fundamental purpose of Terraform is to interoperate with other software. If it is the case that the latest version of YAML allows for quoting, that's delightful, but it's not anyone's current reality. It may be pure, but it ain't real :-)

I try to do things right as often as possible in the software I work on. But I work in reality. I hope you and other Terraformers will understand the day to day challenges those of us who do battle daily are faced with and think about ways to be right by default, and be flexible as an option. A yamlencode function that produces YAML that other systems (in my case, Buildkite) cannot consume is of pretty limited usage.

I have great respect for and appreciation of the Terraform tool and team. Thanks for listening.

@apparentlymart
Copy link
Contributor

Hi @tomharrisonjr!

My request above was to share specific examples of software that doesn't implement YAML in a way that supports the format that Terraform is generating, in which case we would review whether it is either Terraform or the other software that is incorrect and adjust Terraform if appropriate.

I'm still willing to do that, and it does sound like you have a potential example to share. Can you say a little more about what's going on with Buildkite that is causing you problems? I understand that Buildkite is closed-source SaaS software and so not possible for you to describe details about its implementation, but if you can show the input you tried to send to Buildkite (with yamlencode) and the specific error or other problem you encountered when you did so then I'd be happy to review it, and see what we might do to improve compatibility here. We need to see exactly what the problem is though, so we can see exactly what minimal change is needed to achieve compatibility.

@georgikoemdzhiev
Copy link

georgikoemdzhiev commented Sep 20, 2022

The produced YAML code causes issues when the sorting of the terraform code is not kept the same. For example, here is a aws_imagebuilder_component which uses S3Download action which fails currently as the produced YAML swaps the source and destination object properties.

terraform code:

resource "aws_imagebuilder_component" "prod_scheduler_tasks" {
  data = yamlencode({
    phases = [
	{
      name = "build"
      steps = [
        {
          name   = "download-task-scripts"
          action = "S3Download"
          inputs = [
            {
              source      = "s3://${aws_s3_bucket.image_builder.id}/scheduled_task_scripts/*",
              destination = "C:\\Automation\\"
            }
          ]
          onFailure = "Abort"
        }
      ]
    }]
    schemaVersion = 1.0
  })
  name        = "install-scheduled-tasks-${var.environment}"
  description = "Installs Tasks Scheduler tasks for Prod env"
  platform    = "Windows"
  version     = "0.0.1"
}

And that produced this YAML code:

resource "aws_imagebuilder_component" "prod_scheduler_tasks" {
       arn                   = "arn:aws:imagebuilder:eu-west-1:685621570121:component/install-scheduled-tasks-test/0.0.2/1"
       data                  = <<-EOT
            "phases":
            - "name": "build"
              "steps":         
             - "action": "S3Download"
               "inputs":
               - "destination": "C:\\Automation\\"
                 "source": "s3://image-builder-test/scheduled_task_scripts/*"
               "name": "download-task-scripts"
               "onFailure": "Abort"          
...
}

In the above, the source property needs to be above the destination in order for that "S3Download" component step to work

@apparentlymart
Copy link
Contributor

Hi @georgikoemdzhiev! Thanks for sharing that.

Do you know which software is ultimately parsing and decoding that YAML document? I see that you are passing it to an AWS provider resource type, but I'm not sure whether it's the AWS provider which parses it or if it just sends that whole string to some other system which then parses it. I'd like to identify who owns the parser so we can understand the impact of this difference.

This situation is unfortunately more fundamental than just customizing the output format, because listing source before destination here requires information that Terraform doesn't have. As is the case in several other languages, maps in Terraform are not an order-preserving data type and so the order of definition of elements in a constructor is only a source code artifact and has no effect on the behavior at runtime. I don't think allowing a caller to control the serialization order for map elements will be possible with yamlencode in its current design. Instead, we'd need a new function which defines some way to describe a YAML mapping in terms of a Terraform sequence (list or tuple), because those are the two data types in Terraform that retain element order.

I wonder how this YAML structure would be described in other languages that similarly do not retain the declaration order of a constructed map. 🤔

@georgikoemdzhiev
Copy link

Hello, thank you for addressing my comment.

Do you know which software is ultimately parsing and decoding that YAML document?

I believe the software that parses the YAML is AWSTOE and it is used by AWS Image Builder itself but I am not sure. Looking at the Image Builder docs it certainly sounds like that is the software parsing the YAML

This is an extract from the docs:
Image Builder uses the AWS Task Orchestrator and Executor (AWSTOE) component management application to orchestrate complex workflows.

@apparentlymart
Copy link
Contributor

Thanks for that information, @georgikoemdzhiev.

After following a few links I believe you are right and that in particular the specific part you raised here is the action-specific input arguments for the S3Download action.

I wasn't able to find anything in the documentation stating that this format requires the input mapping entries to appear in any particular order, though. Did the provider return an error when you tried submitting this generated YAML document? Can you share the full error message so we can confer with the AWS provider team to see where that error might be coming from? Thanks!

@georgikoemdzhiev
Copy link

georgikoemdzhiev commented Sep 21, 2022

Can you share the full error message so we can confer with the AWS provider team to see where that error might be coming from?

Hi Martin, I tried to replicate the issue I was having today but it appears that I can no longer replicate it. The issue with using using the yamlcode function and my aws_imagebuilder_component occurred an issue a while ago but did not open a ticket back then - I worked around the problem by using raw YAML using in my aws_imagebuilder_component. I apologise for the confusion. It might as well have been issue with something else.

@anmnz
Copy link

anmnz commented Sep 22, 2022

As is the case in several other languages, maps in Terraform are not an order-preserving data type and so the order of definition of elements in a constructor is only a source code artifact and has no effect on the behavior at runtime.

One of those languages is YAML itself! The YAML spec is clear on this. Anything that requires a particular ordering of keys in a YAML mapping is not processing YAML correctly.

Link to YAML 1.2 spec: 3.2.2.1. Mapping Key Order. (The YAML 1.1 spec has almost identical text.)

@apparentlymart
Copy link
Contributor

Hi all!

We left this issue open for a few years to try to gather specific details on any situations where Terraform's yamlencode function is producing something that is not valid YAML per the YAML specification.

My read of the discussion above is that several participants still have the reasonable style preference to produce unquoted mapping keys, although that particular change is not something we intend to make for the reasons I described earlier. Other than that, it doesn't seem like we've found any significant cases where yamlencode is producing something that cannot be parsed by a correct YAML parser implementation.

For that reason, we're planning to move forward with treating yamlencode's current behavior as stable once we reach the forthcoming v1.4.0 release, including removal of the long-standing experimental feature warning in the docs via #31907.

I'm going to close this issue now specifically to represent that we do not intend to produce a configurable yamlencode, which was what this issue originally represented. I subsequently overloaded this issue to also be a place to track potential compatibility concerns, but with that stability change in place I'd ask that folks now prefer to report such concerns via separate issues for each situation so that we can track and prioritize them separately. As I mentioned over in the PR, our new policy for this function (in the spirit of the Terraform 1.x Compatibility Promises) is:

  • If someone finds a situation in which yamlencode is producing a result which is not valid per the YAML 1.2 specification then we will aim to fix it but will also aim to make the fix as surgical as possible so that it should not change any previously-valid output that a configuration may be depending on. Please open a new issue in this repository if you have found such an example.
  • If someone finds a situation where another piece of software has a YAML-noncompliant parser which cannot accept a yamlencode result even though it is valid per YAML 1.2 then we will prefer to treat that as a bug in the parser that should be corrected there, rather than something Terraform will change to work around. In this case, please open an issue or similar request with the developer of the parsing software to request that they change their parser to be spec-compliant.

This issue also touched on a concern which isn't really part of yamlencode's scope: providers that accept YAML documents in their arguments are responsible for ensuring that the exact YAML serialization does not matter and that only differences in the effective document being described will be considered as meaningful changes. That means that one of the following must be true for any argument which accept a YAML documents as a string:

  • The provider and the remote system it represents both the YAML provided by the author as an exact byte stream, echoing it back exactly as written in the input without any normalization. This is the case for user_data in aws_instance, for example: the AWS provider and AWS API just treat that argument as an opaque sequence of bytes, so there's no situation where it gets normalized and returned back as the same data but in a different exact serialization.
  • The provider or the remote system it represents store a data structure derived from the YAML rather than storing the YAML source code itself, and then reproduce equivalent YAML when reading data back from the API. In this case the provider must provide logic to determine whether two non-equal YAML strings are semantically equal according to the non-YAML internal representation, and must obey Terraform's resource instance change lifecycle rules to avoid reporting a style-only change as if it were a meaningful change to be applied.

If you find situations where a provider seems to refresh YAML into a shape that doesn't match the input -- regardless of whether that input was produced using the yamlencode function -- then the best path is to report that to the provider developer as a bug to be fixed, so that they can change the provider to ensure that it fits into one of the two valid situations described above.


Thanks for the discussion here, and in particular to those who shared specific potential concerning examples for us to study. That has been helpful in confirming that the yamlencode output is broadly valid and reasonable to stabilize.

We do intend to eventually support functions contributed by provider plugins as a new extension point for the Terraform language, although that is still subject to some research and design work before it's ready to go. Once we reach that point, those who have a interest in generating a particular YAML style that doesn't match the yamlencode builtin will be free to write a provider plugin that offers a slightly different interpretation of YAML encoding, if they wish. In the meantime you can approximate such a thing using a provider with a local-compute-only data source in it, at the expense of it being less convenient to use than a built-in function.

Thanks again!

@apparentlymart apparentlymart closed this as not planned Won't fix, can't repro, duplicate, stale Sep 30, 2022
@github-actions
Copy link
Contributor

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests