Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate singleton lists as embedded objects #387

Merged
merged 13 commits into from
May 8, 2024

Conversation

ulucinar
Copy link
Collaborator

@ulucinar ulucinar commented Apr 17, 2024

Description of your changes

Fixes #136

Terraform configuration blocks, even if they have a MaxItems constraint of 1, are (almost) always generated as lists. We now generate the lists with a MaxItems constraint of 1 as embedded objects in our MR APIs.

This also helps when updating or patching via SSA these list objects (which are now converted to embedded objects). The merging strategy implemented by SSA requires configuration for associative lists and converting the singleton lists into embedded objects removes the configuration need.

In addition to the changes in the code generation pipelines, this PR also implements the runtime conversion logic needed to convert the CRD's embedded objects back to the singleton lists when passing data to the Terraform stack. We also need to convert the singleton lists we read from the Terraform stack into embedded objects for updating spec.forProvider (late-initialization), status.atProvider, or when updating the connection details secrets.

This PR also introduces a schema traverser, which can decouple the Terraform schema traversal logic from the actions (such as code generation, inspection, or singleton-list-to-embedded-object conversion) taken while traversing the schema.

When providers consume these changes, they are forced to generate the singleton lists in their APIs as embedded objects. By default, the generated APIs should not be affected. A provider can opt in the conversion, by specifying the following Provider option in upjet's config.NewProvider:

pc := ujconfig.NewProvider(...
	ujconfig.WithSchemaTraversers(&ujconfig.SingletonListEmbedder{}),

This traverser will detect the singleton lists and mark them to be generated as embedded objects. Provider maintainers will also have the flexibility to override the resource configuration generated by this traverser with upjet's resource configuration framework. This implies that the resource configuration overrides are executed after the mutating traversers act on the resource configurations.

Although it's possible to overwrite the existing CRD API versions, in the providers, with the converted APIs, the providers consuming this change can also generate new versions of their CRD APIs with the embedded object schemas. We extend the existing multi-version CRD generation support to allow specifying the storage version and the hub version via config.Resource.SetCRDStorageVersion and config.Resource.SetCRDHubVersion, respectively. config.Resource.CRDStorageVersion and config.Resource.CRDHubVersion can be used to access the configured storage and hub versions, respectively. These have been implemented as functions instead of struct fields to enforce defaulting: If the storage version or the hub version is not specified, then the default is to use the configured API version. This is also the current behavior.

If the providers opt to generate the converted APIs as new versions, this PR also implements two common CRD API conversion functions to be invoked, at runtime, by their conversion webhooks. These conversion functions can be registered with the API conversion registry as follows:

r.Conversions = []conversion.Conversion{
	conversion.NewIdentityConversionExpandPaths(conversion.AllVersions, conversion.AllVersions, []string{"spec.forProvider", "spec.initProvider", "status.atProvider"}, r.CRDListConversionPaths()...),
	conversion.NewSingletonListConversion("v1beta1", "v1beta2", r.CRDListConversionPaths(), conversion.ToEmbeddedObject),
	conversion.NewSingletonListConversion("v1beta2", "v1beta1", r.CRDListConversionPaths(), conversion.ToSingletonList)}

The identity converter will handle copying of the same name fields from the conversion source object to the target, ignoring the converted fields (remember that the embedded objects in the new API have the same field names as the singleton lists in the old API but their types differ and thus, they cannot be handled by the identity converter).

The singleton list converter works on the converted fields at any depth and is responsible for the conversions between the singleton lists and the embedded objects.

If the providers use these converters, they can keep their old APIs (e.g., v1beta1) intact and introduce the embedded object schemas in the new API versions (e.g., v1beta2). When it encounters an old v1beta1 API with the singleton lists, the registered conversion webhook will be able to convert it to the new v1beta2 API with the embedded objects seamlessly.

I have:

  • Read and followed Upjet's contribution process.
  • Run make reviewable to ensure this PR is ready for review.
  • Added backport release-x.y labels to auto-backport this PR if necessary.
  • Add unit tests
  • Add runtime conversion between singleton lists & embedded objects to the TF plugin framework external client
  • If we are not going to support this conversion in the CLI-based runtime, then we should not just convert their APIs (as their runtimes will simply be broken)

How has this code been tested

Manually tested via an affected resource (AccessPolicy. conditionalaccess, which currently has many singleton lists in its API) in crossplane-contrib/provider-upjet-azuread.

Also validated the same resource by consuming the upjet changes from this PR's feature branch but not configuring the list converter in crossplane-contrib/provider-upjet-azuread. This is to make sure that the proposed changes in the code generation pipeline & the upjet runtime are both backwards-compatible.

@ulucinar ulucinar marked this pull request as draft April 17, 2024 09:18
@ulucinar ulucinar force-pushed the embed-singleton branch 7 times, most recently from f22cff7 to 017f4f2 Compare April 18, 2024 13:38
Copy link
Member

@mergenci mergenci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work @ulucinar. Please find my comments below.

pkg/config/common.go Show resolved Hide resolved
pkg/config/conversion/conversions.go Show resolved Hide resolved
pkg/controller/external_tfpluginsdk.go Outdated Show resolved Hide resolved
// +kubebuilder:storageversion
{{ end }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Would {{- end }} be more appropriate here? Otherwise, it seems to me that double blank lines will follow.

Copy link
Member

@sergenyalcin sergenyalcin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ulucinar I left a few comments. I also tested this PRs' some functionalities in the following one: #397

Comment on lines +77 to +90
switch mode {
case ToSingletonList:
slices.Sort(paths)
case ToEmbeddedObject:
sort.Slice(paths, func(i, j int) bool {
return paths[i] > paths[j]
})
}
Copy link
Member

@sergenyalcin sergenyalcin May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this sorting because of the order of the path processing? I mean, we ensure that conversions are performed correctly even when nested structures are present. For example, in the ToEmbeddedObject case, the deeper structures are processed first. Because if we change the deeper structures before the higher-level structures, it will not affect the conversion of the higher-level structures.

What do you think about whether there is any potential risk during this approach? Do you think it's worth coding this in more conventional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relevant issue here is the fieldpath expressions use indices (e.g., [*]) for traversing lists. We generate the fieldpath expressions for the singleton lists that have been converted using the expressions that work on the original Terraform topology. And we need to convert from embedded objects to singleton lists (when passing data from the XP layer to the TF layer) and from singleton lists to embedded objects (when passing data from the TF layer to the XP layer) at runtime using these fieldpath expressions. If we do not process the field conversions in the correct order, the expressions may mismatch the temporary topology during a conversion. For example, let's assume a singleton list contains another nested singleton list, both of which we've converted and let's assume the corresponding fieldpath expressions for the conversion targets are a.b[*] and a.b[*].c[*], respectively. When converting from embedded objects to singleton lists, converting a.b (which is an embedded object in the source) before its children relieves us from having to adjust the fieldpath expressions with the a.b[*]. prefix because once a.b is converted into a singleton list, then the fieldpath expressions that target its children now match the (converted) topology. Similarly, when converting from singleton lists to embedded objects at runtime (e.g., after reading the TF state), converting a.b[*] after a.b[*].c[*] relieves us from having to adjust the fieldpath expressions that target a.b[*]'s children.

This is the motivation of processing the conversions in an order. We could as well adjust the fieldpath expressions to match the topology as we perform the conversions, but this looks simpler to me.

Comment on lines -24 to -30
gvk := dst.GetObjectKind().GroupVersionKind()
if err := runtime.DefaultUnstructuredConverter.FromUnstructured(srcMap, dst); err != nil {
return errors.Wrap(err, "cannot convert the map[string]any representation of the source object to the conversion target object")
}
// restore the original GVK for the conversion destination
dst.GetObjectKind().SetGroupVersionKind(gvk)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, we moved this block to IdentityConversion, right? What do you think about a wrong configuration (via Resource.Conversions) that may override this part? Then we will miss the identity copying.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct. Previously, conversion.RoundTrip would unconditionally do the identity conversion which copies fields with the same name & path into the target object. But in the scenario where we have converted singleton lists into embedded objects, the source and the target end up having fields with different types (a list and an object) at identical paths, which breaks the identity conversion.

Furthermore, it's possible to have, for instance, a conversion.ManagedConversion, which performs the identity conversion itself (e.g., by directly copying some metadata fields from source into the target). We previously did not have this flexibility and conversion.RoundTrip would always do the identity conversion.

It's now optional and for backwards compatibility, config.DefaultResource now initializes the config.Resource.Conversions slice with the identity conversion. In contexts where we don't want the identity conversion, then we need to replace Conversions slice.

There could in theory be clients which just replace the slice here although they need the identity conversion and will be broken when they consume these changes. If they are overriding the default Conversions slice, then it means they are not interested in using the defaults. But probably they just assume there are no defaults instead of not using these (or any future) defaults. I consider this as a bad practice. Another complication is that there could be multiple places in the provider code where the converters are added. If multiple sites replace these converters (instead of appending to already existing converters), then again it's probably unintended.

Unfortunately, I think this has roots in not following the principle of encapsulation in the resource configuration API (the configuration data fields are just exported). This requires a self-discipline at the client side.

- Terraform configuration blocks, even if they have a MaxItems
  constraint of 1, are (almost) always generated as lists. We
  now generate the lists with a MaxItems constraint of 1 as
  embedded objects in our MR APIs.
- This also helps when updating or patching via SSA the
  (previously list) objects. The merging strategy implemented
  by SSA requires configuration for associative lists and
  converting the singleton lists into embedded objects removes
  the configuration need.
- A schema traverser is introduced, which can decouple the Terraform
  schema traversal logic from the actions (such as code generation,
  inspection, or singleton-list-to-embedded-object conversion)
  taken while traversing the schema.

Signed-off-by: Alper Rifat Ulucinar <[email protected]>
the generated CRD API version as the storage version.

Signed-off-by: Alper Rifat Ulucinar <[email protected]>
…versions between

singleton list & embedded object API versions.

- Export conversion.Convert to be reused in embedded singleton list webhook API conversions.

Signed-off-by: Alper Rifat Ulucinar <[email protected]>
the storage & hub API versions independently.

- The default values for both of the storage & hub versions is
  the version being generated, i.e., Resource.Version.
- Replace pipeline.ConversionHubGenerator & pipeline.ConversionSpokeGenerator
  with a common pipeline.ConversionNodeGenerator implementation
- Now the hub generator can also inspect the generated files to regenerate
  the hub versions according to the latest resource configuration and
  we have removed the assumption that the hub version is always the
  latest version generated.
- Fix duplicated GKVs issue in zz_register.go.

Signed-off-by: Alper Rifat Ulucinar <[email protected]>
…ield paths

when the conversion.RoundTrip copies the fields, from src to dst, with the same name.

Signed-off-by: Alper Rifat Ulucinar <[email protected]>
- Fix runtime conversion for expanded field paths of length
  greater than 1.

Signed-off-by: Alper Rifat Ulucinar <[email protected]>
- Prevent execution of these pipelines multiple times for each available
  versions of an API group.
- Improve conversion.RoundTrip paved conversion error message

Signed-off-by: Alper Rifat Ulucinar <[email protected]>
Copy link
Member

@sergenyalcin sergenyalcin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ulucinar LGTM!

Copy link
Member

@mergenci mergenci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your changes @ulucinar 🙏 LGTM.

@ulucinar ulucinar merged commit 03a207b into crossplane:main May 8, 2024
6 checks passed
@ulucinar ulucinar deleted the embed-singleton branch May 8, 2024 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

One-element arrays could be considered as objects
3 participants