-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global IDs for representing relationships between resource objects (object containment, name collision detection, etc) #22094
Comments
A further use-case: Detecting when
|
Related Terraform Plugin SDK issue: hashicorp/terraform-plugin-sdk#224 |
Another related use-case: Indicating connected resources that will prevent replacementThis is related to the containment relationship, but the "contained" resource is connected to multiple resources in a graph structure. This example references resources from the AWS Provider and was inspired by hashicorp/terraform-provider-aws#636. Note that there is a workaround by adding randomness to the resource name. When configuring an AWS load balancer, some of the resources involved are: The A relationship similar to |
I'm not sure this belongs exactly here, but it feels worth mentioning: Apart from internally knowing all possible relationships between resource types, or being explicit about them via resource "aws_ecs_service" "service" {
name = var.service_name
cluster = aws_ecs_cluster.cluster.arn # << service needs cluster
wait_for_steady_state = true
task_definition = aws_ecs_task_definition.task_definition.arn # << service needs the task definition
load_balancer {
container_name = var.service_name
container_port = var.container_port2
target_group_arn = aws_lb_target_group.service_target_group.arn # << service also needs the target group
}
} Obviously this is very trimmed down, and especially with modules coming into play things become less obvious (should modules be boundaries, or could one even follow dependencies through variables and locals?), but still the point stays: I'm already telling terraform the relationship that my resources have in my case, so terraform could use that ...1 1 ... for example by not deleting the target group before waiting for the service to have settled again after deleting the |
Another related use case https://discuss.hashicorp.com/t/why-did-terraform-not-recognise-that-affected-resources-have-been-affected/61530/2 (which links back to this issue) |
This is a description of a problem space and some initial sketches for how it might be solved. It's not yet a fully actionable proposal, since we need to gather more examples and do prototyping with them to figure out exactly what the problem cases are and thus how best to solve them.
For now, this issue is here mainly so it can be mentioned in other issues (e.g. in provider repositories) that describe use-cases where this mechanism might be beneficial.
As a consequence, anything here is subject to change in subsequent discussion.
Terraform currently considers each remote object to be entirely distinct from others. That includes (but is not limited to) the following incorrect assumptions:
The above assumptions are clearly false for many real-world vendor APIs, though in practice we've been able to work around most of them in one way or another. In some cases that requires special care on the part of the user though, which can be problematic if violating the assumption has a negative effect such as system downtime or Terraform becoming "stuck" and unable to make progress.
Based on real-world experience with existing APIs, it seems like Terraform could benefit from explicit modelling of relationships between resource objects that are richer than what can be inferred only from the user-provided dependency graph. In principle providers could use their knowledge about the remote system to give Terraform more information about these relationships, and then Terraform could use that information to prevent certain obviously-incorrect actions and to generate warnings about situations that are less certain.
The remainder of this issue is some notes about a possible way to achieve that, and some initial ideas about how it might be used. This initial sketch is mainly serving as a request for example use-cases to inform a next iteration of it, and not something that is currently ready to implement.
Global Object IDs
Prior to Terraform 0.12, Terraform required all resource instance objects to have an associated
id
attribute, but imposed no requirement on how providers would use it other than that it must not be an empty string. In practice, that requirement didn't really serve any purpose from Terraform Core's standpoint, and so from Terraform 0.12 onwards there is no such requirement at the Core level, though as I write this the SDK does still impose that requirement for0.11-compatibility reasons.
However, having a more strongly-defined sense of an ID for an object -- one that is global in scope and allows Terraform Core to make certain assumptions about it -- could be a useful building block for modelling relationships between objects.
Some of the remote systems we interact with already have a sense of ids that are global to their entire system. For example, AWS has the idea of an "ARN" which can uniquely identify a particular object across the whole of AWS, including not only the service-local unique identifier but also the overall AWS account the object belongs to and (where appropriate) the service region it was created in.
We can potentially generalize this idea by allowing each Terraform provider to define its own unique id scheme. The provider itself would control that scheme but Terraform Core would make certain assumptions about it that the provider must ensure are valid:
Because the requirements for each remote system are different, Terraform Core would impose only a simple syntax requirement on these ids: they must be strings and they must start with the provider type name followed by a colon. After the colon can be any valid sequence of Unicode printable characters. If the remote system already has a suitable global ID syntax, it may be best to just use that directly in case these ids are seen by users (though ideally they should not be).
For example, any id generated by the
azurerm
provider must begin withazurerm:
but can then be followed by any any printable Unicode characters needed to fully describe the identity of an object.In practice I suspect we might elect to allow each remote object to have potentially multiple global object IDs, as a way to handle changes in the format over time (can report both the old and new forms at once) and to deal with any other unavoidable ambiguity that might arise. In that cases though, each distinct ID string should still only be associated with one object.
Not all objects need to have global IDs. Firstly, if we were to introduce a feature like this then necessarily it would start with most existing providers not supporting it universally, and even after it's been around for a while the global ID mechanism would serve no purpose for certain object types. In particular, there's no reasonable global persistent ID for many of the transient in-state-only object types that are offered by providers like
null
,tls
, etc.Potential Uses for Global IDs
The following sections describe some situations we've already encountered that Global IDs might be useful for. There are likely other ways these problems could be addressed too, so this section is mainly here just as a set of examples to help us identify other problems that we might be able to address through the introduction of Global IDs.
Detecting Object Collisions
A straightforward use of Global IDs is to automatically detect and flag when two objects in the same state have the same Global ID. That suggests a user error (defining the same object twice) and ought to be resolved somehow before proceeding, or Terraform's behavior would otherwise be unpredictable.
Another variant of this is situations where the provider has enough information available at plan time to predict one or more specific Global IDs for an object that hasn't been created yet. That would then potentially allow Terraform to detect collisions during planning and prevent them from occurring in the first place.
Terraform will not always have sufficient information to detect this at plan time (if the Global ID is derived from values that won't be known until apply time), but in that case it would degenerate to the first case above of detecting the conflict during a subsequent plan and requiring some sort
of resolution. (Exactly what resolution would be possible/appropriate is an open question; perhaps Terraform would require removing all but one of the conflicting
resource
blocks but then skip creatingDelete
actions for those in the plan, assuming the user is intending the still-remainingresource
block to be the "owner" of that previously-shared object.)"Containment" relationship
Many remote systems have a sense of one domain object being "contained within" another, which for the sake of this section we'll define as where the container object must outlive all of the contained objects. There are two main variants of this we've seen across many systems:
Both of these situations violate Terraform's current assumptions. In the first case this can result in apply-time failures or timeouts, while the second case is more problematic in that it will tend to cause Terraform state to go out of sync with reality because Terraform cannot see that the contained objects have been deleted.
To address this, we could potentially augment the resource instance object state model so that each object can record:
While storing both directions of this relationship is redundant in the case where all objects are in the same configuration, it is possible (and, perhaps, common) for the objects to be split across two separate configurations by making use of data sources, and so the bidirectional tracking gives Terraform a fuller picture of the relationships in such cases.
The intent of these two sets is that they would be set by the provider during any changes, but also would be refreshed by the provider during a refresh operation, probably by calling an API to query the relationship.
As a specific example, consider that
aws_subnet
resources are always contained withinaws_vpc
resources: it's not possible to delete a VPC as long as at least one subnet exists. In this case it is a many-to-one relationship represented in the API as a foreign key on the subnet side, so theaws_subnet
implementation can trivially determine the Global ID of the single VPC the subnet belongs to without any further queries (it's a transform of thevpc_id
attribute), but theaws_vpc
implementation would need to additionally callDescribeSubnets
during refresh to properly populate the set of subnets that are contained within it, even if they were created in a different configuration.Terraform Core can use this information to produce a more accurate plan whenever a container is planned for destruction. Terraform Core might see that a
Delete
action is planned for anaws_vpc
and thus also automatically planDelete
actions for the associated subnets in the same configuration. If there are any contained subnets that are not known in the currentworkspace state, Terraform could return an error saying that these contained objects must be destroyed first, and thus leave the human operator to decide which other Terraform configuration must be changed to achieve that.
The containment relationship also allows for improving Terraform's behavior in the more complex case of
DeleteThenCreate
orCreateThenDelete
actions: this additional information might allow Terraform to understand both that it needs to replace all of the subnets when a containing VPC is replaced and that these objects are related in a way that requires a specific orderingof the destroy and create actions to produce a correct result.
Referring to Objects in the UI
The above use-cases include situations where Terraform Core must report a problem to the user that will include references to involved objects. Since the global IDs are not necessarily user-friendly, we might elect to have a mechanism to ask a provider to generate a human-friendly (but potentially slightly ambiguous) name for a given global ID.
For example, while AWS VPC objects are a per-region namespace in principle, in practice collisions between regions are very unlikely within a particular user's infrastructure and so it is common to talk about VPCs and subnets using just their region-local ids, without qualifying them with a region. The AWS provider might elect to transform a full VPC ARN into just a
vpc-abc123
-like string for display to the user, assuming that the user will have enough contextto understand which region is relevant, and intentionally excluding the AWS account id because VPC IDs never overlap between two AWS accounts.
Relationships Between Providers
A key feature of Terraform is its ability to easily pass data between objects in entirely different systems. For example, the IP address of a created compute instance might be sent to a separate DNS vendor to create a DNS record.
It's not clear yet whether there are use-cases for representing Global ID-based relationships between objects in different providers. If there are then the global nature of these identifiers would make that possible, but that then imposes an additional compatibility constraint on each provider as the details of its global ID formats would be embedded in the logic of other providers.
Until we identify a specific use-case for representing a cross-provider relationship, I suggest we forbid it to start. Then if a use-case is found later we can use that real example to figure out what constraints ought to apply in that cross-provider case, rather than risking being constrained by a
naive design not informed by use-cases.
Sidebar: Global Object IDs for multi-instance systems
The idea of allocating global object ids maps nicely onto hosted (SaaS, etc) systems where the namespace of objects is physically fixed to a particular vendor and no other instances are available. It's trickier for self-hosted software and other situations where the physical location of the remote system is part of its unique identifier.
For example, the
mysql
provider is configured with a hostname or IP address for the specific MySQL server to talk to. If the server has a stable, meaningful hostname then using that hostname as part of the identifier is reasonable, but in modern ephemeral environments such services often don't have stable locations and are instead located via a service discovery system, which may not be implemented via DNS lookups.How to robustly allocate global object IDs for this class of remote system is an open question still to be resolved. A key requirement is that it be possible to move the system to another physical address without implicitly renaming all of its existing global IDs, which seems likely to involve introducing some sort of user-controlled "logical location" that is distinct from the physical
location and can persist as the service moves between physical locations, but without imposing operational constraints on the service such as being at a stable hostname.
The text was updated successfully, but these errors were encountered: