Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid references in Pulumi package schema #2565

Closed
DSoko2 opened this issue Jun 17, 2023 · 16 comments
Closed

Invalid references in Pulumi package schema #2565

DSoko2 opened this issue Jun 17, 2023 · 16 comments
Assignees
Labels
area/providers impact/quality kind/engineering Work that is not visible to an external user kind/enhancement Improvements or new features
Milestone

Comments

@DSoko2
Copy link

DSoko2 commented Jun 17, 2023

@davidspielmann and I recently started looking deeper into Pulumi package schemas and bumped into some issues with named types/references in the schema of the aws classic provider. Talking about it at PulumiUP, @mnlumi encouraged us to share our insight. Perhaps it can help to get some of the references fixed with the upcoming 6.0 release :)

Background: Pulumi package schemas include "Named Types", which are references (a URI string) to a type specification. Our understanding of how these URIs are resolved from the $ref field is https://github.com/pulumi/pulumi/blob/master/pkg/codegen/schema/schema.go#L1409-L1416
Valid references are the built-in types pulumi.json#/Archive, pulumi.json#/Asset, pulumi.json#/Any, and pulumi.json#/Json. All other references have the format [origin]#/{resources,types}/[urn] where [origin] is the file path or URL of a schema file (empty for in-document references) and [urn] the identifier of the respective type definition under "resources" or "types" in that schema.

Expectation: All references in a package schema can be resolved. However, 168 references (25 URIs) in the schema of [email protected] cannot be resolved as defined above. These are the non-resolvable URIs we found; some of them are also noted in other GH issues (referenced).

These 77 references (15 URIs) cannot be resolved but are valid resources, i.e., they could be resolved when prefixing with #/resources/:

  • aws:apigateway/deployment:Deployment (2 occurrences)
  • aws:apigateway/restApi:RestApi (24 occurrences)
  • aws:cloudwatch/logGroup:LogGroup (2 occurrences)
  • aws:ec2/launchConfiguration:LaunchConfiguration (2 occurrences)
  • aws:ec2/placementGroup:PlacementGroup (2 occurrences)
  • aws:elasticbeanstalk/application:Application (4 occurrences)
  • aws:elasticbeanstalk/applicationVersion:ApplicationVersion (3 occurrences) Elesticbeanstalk Environment refers to type that doesn't exist #2358
  • aws:iam/group:Group (4 occurrences)
  • aws:iam/instanceProfile:InstanceProfile (4 occurrences)
  • aws:iam/role:Role (8 occurrences) Dangling type $ref in Schema #1820
  • aws:iam/user:User (4 occurrences)
  • aws:iot/policy:Policy (2 occurrences)
  • aws:lambda/function:Function (2 occurrences)
  • aws:s3/bucket:Bucket (6 occurrences)
  • aws:sns/topic:Topic (8 occurrences)

These 21 references (5 URIs) cannot be resolved but are valid types if treated case-insensitive, i.e., they could be resolved when ignoring the case and prefixing with #/types/:

These 70 references (5 URIs) cannot be resolved and we did not find a way to fix it:

  • aws:autoscaling/metrics:Metric (3 occurrences)
  • aws:ecr/lifecyclePolicyDocument:LifecyclePolicyDocument (2 occurrences)
  • aws:iam/documents:PolicyDocument (22 occurrences)
  • aws:index/aRN:ARN (42 occurrences) Seemingly dangling type reference to aRN:ARN #1682
  • aws:s3/routingRules:RoutingRule (1 occurrences)

This is the Python notebook we used for this analysis. It also applies the analyis to [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]. For most of them, all references are fine, with minimal exceptions (which, ideally, also would be fixed).
https://gist.github.com/DSoko2/4630a14c97a9469385dceed0ab07bc1f

We hope this helps to improve this provider's schema, i.e, eliminate workarounds and non-resolvable URIs. If you should have questions or we can help somehow, please let us know.

@DSoko2 DSoko2 added kind/enhancement Improvements or new features needs-triage Needs attention from the triage team labels Jun 17, 2023
@mikhailshilkov mikhailshilkov added area/providers kind/engineering Work that is not visible to an external user impact/quality and removed needs-triage Needs attention from the triage team labels Jun 19, 2023
@mikhailshilkov
Copy link
Member

@DSoko2 Thanks a lot for this comprehensive analysis! It sounds like we should start from the bottom and figure out the entirely missing references.

For my understanding, how blocked are you on this? Are you able to ignore them and proceed with other tasks, or is there anything we can do to unblock you quickly?

@DSoko2
Copy link
Author

DSoko2 commented Jun 19, 2023

For now, we are not blocked because we can hack around it with the knowledge we got through the analysis.

Still, I think fixing these issues is important: The longer these inconsistencies consist, the more workarounds people in the ecosystem may implement. It also may question the reliability of the schemas in general given the popularity of this provider.

It sounds like we should start from the bottom and figure out the entirely missing references.

Yes, this would certainly be valuable! However, fixing the other ones (First two lists in the initial post with workarounds) seems to me like it could be done rather soon and would prevent people like us from adding hacky resolution workarounds to their tools. Certainly, fixing them could still impact schema consuming implementations that solely implemented the workaround but I would argue that the default, documented way of resolving URIs should have priority. Eventually, consuming implementations with workarounds that also support the documented URI resolution shouldn't be impacted.

@mnlumi
Copy link

mnlumi commented Jun 20, 2023

@DSoko2 Thank you so much for sharing your findings and opening this issue! Much appreciated!

@t0yv0
Copy link
Member

t0yv0 commented Sep 16, 2024

Quick update, I'm checking up on how we are doing here in the 6x series of AWS. I can confirm that the following references still do not resolve to entries under .types.

    schema_test.go:38: Dangling reference: aws:alb/ipAddressType:IpAddressType (isRes=false)
    schema_test.go:38: Dangling reference: aws:alb/loadBalancerType:LoadBalancerType (isRes=false)
    schema_test.go:38: Dangling reference: aws:apigateway/deployment:Deployment (isRes=true)
    schema_test.go:38: Dangling reference: aws:apigateway/restApi:RestApi (isRes=true)
    schema_test.go:38: Dangling reference: aws:autoscaling/metrics:Metric (isRes=false)
    schema_test.go:38: Dangling reference: aws:autoscaling/notificationType:NotificationType (isRes=false)
    schema_test.go:38: Dangling reference: aws:cloudwatch/logGroup:LogGroup (isRes=true)
    schema_test.go:38: Dangling reference: aws:ec2/launchConfiguration:LaunchConfiguration (isRes=true)
    schema_test.go:38: Dangling reference: aws:ec2/placementGroup:PlacementGroup (isRes=true)
    schema_test.go:38: Dangling reference: aws:ecr/lifecyclePolicyDocument:LifecyclePolicyDocument (isRes=false)
    schema_test.go:38: Dangling reference: aws:elasticbeanstalk/application:Application (isRes=true)
    schema_test.go:38: Dangling reference: aws:elasticbeanstalk/applicationVersion:ApplicationVersion (isRes=true)
    schema_test.go:38: Dangling reference: aws:iam/Role:Role (isRes=false)
    schema_test.go:38: Dangling reference: aws:iam/documents:PolicyDocument (isRes=false)
    schema_test.go:38: Dangling reference: aws:iam/group:Group (isRes=true)
    schema_test.go:38: Dangling reference: aws:iam/instanceProfile:InstanceProfile (isRes=true)
    schema_test.go:38: Dangling reference: aws:iam/role:Role (isRes=true)
    schema_test.go:38: Dangling reference: aws:iam/user:User (isRes=true)
    schema_test.go:38: Dangling reference: aws:index/aRN:ARN (isRes=false)
    schema_test.go:38: Dangling reference: aws:index/region:Region (isRes=false)
    schema_test.go:38: Dangling reference: aws:iot/policy:Policy (isRes=true)
    schema_test.go:38: Dangling reference: aws:lambda/function:Function (isRes=true)
    schema_test.go:38: Dangling reference: aws:rds/engineType:EngineType (isRes=false)
    schema_test.go:38: Dangling reference: aws:s3/bucket:Bucket (isRes=true)
    schema_test.go:38: Dangling reference: aws:s3/routingRules:RoutingRule (isRes=false)
    schema_test.go:38: Dangling reference: aws:sns/topic:Topic (isRes=true)

@t0yv0
Copy link
Member

t0yv0 commented Sep 16, 2024

I've taken stock off the non-resource dangling references. All these originate from Node-only type overlays. There are two ways forward with these:

  1. remove Node-only overlays and support the functionality for every language; breaking change waiting for AWS v7
  2. register the overlay tokens in the schema; unfortunately this is blocked on Adding IsOverlay entries to register a dangling token results in breaking changes pulumi#17274 at the moment

I suspect our preference is (1) but I'll follow up with the team.

I will also follow up on the resource danging references next.

@t0yv0
Copy link
Member

t0yv0 commented Sep 17, 2024

Notes on resource references. Currently due to limitations in the bridge, resource references are exposed as type references.

For example, looking at "aws:cloudwatch/logSubscriptionFilter:LogSubscriptionFilter":

                "logGroup": {
                    "type": "string",
                    "oneOf": [
                        {
                            "type": "string"
                        },
                        {
                            "type": "string",
                            "$ref": "#/types/aws:cloudwatch/logGroup:LogGroup"
                        }
                    ],
                    "description": "The name of the log group to associate the subscription filter with\n",
                    "willReplaceOnChanges": true
                },

It is possible to change that to use a resource reference which still works:

                "logGroup": {
                    "type": "string",
                    "oneOf": [
                        {
                            "type": "string"
                        },
                        {
                            "type": "string",
                            "$ref": "#/resources/aws:cloudwatch/logGroup:LogGroup"
                        }
                    ],
                    "description": "The name of the log group to associate the subscription filter with\n",
                    "willReplaceOnChanges": true
                },

This change generates no diffs for Node but it does generate slight changes in other languages:

-        public Input<string> LogGroup { get; set; } = null!;
+        public InputUnion<string, Pulumi.Aws.CloudWatch.LogGroup> LogGroup { get; set; } = null!;


-                 log_group: pulumi.Input[str],
+                 log_group: pulumi.Input[Union[str, 'LogGroup']],

Java:

-    private Output<String> logGroup;
+    private Output<Either<String,LogGroup>> logGroup;


-        public Builder logGroup(String logGroup) {
+        public Builder logGroup(Either<String,LogGroup> logGroup) {
             return logGroup(Output.of(logGroup));
         }

+        /**
+         * @param logGroup The name of the log group to associate the subscription filter with
+         *
+         * @return builder
+         *
+         */
+        public Builder logGroup(String logGroup) {
+            return logGroup(Either.ofLeft(logGroup));
+        }
+
+        /**
+         * @param logGroup The name of the log group to associate the subscription filter with
+         *
+         * @return builder
+         *
+         */
+        public Builder logGroup(LogGroup logGroup) {
+            return logGroup(Either.ofRight(logGroup));
+        }
+

I think these changes are not breaking on the input side, but can be technically breaking on the output side (in the Java example above). This is sufficiently well-contained this could be something we can clean up without waiting for a major release.

@t0yv0
Copy link
Member

t0yv0 commented Sep 17, 2024

Further discussion with @mikhailshilkov we are contemplating the possibility to construct two schemas for the moment. One schema similar to the present schema will be used for the purposes of:

  1. having overlays in the registry as in https://www.pulumi.com/registry/packages/aws/api-docs/lambda/callbackfunction/
  2. having overlay references as a way to communicate to the Node SDK generator to avoiding breaking the Node SDK for the users

The second variation of the schema will have all overlay types stripped down, not have any dangling references, and accurately describe non-Node SDKs.

This scheme can unblock use cases until we are able to achieve full unification in v-next.

@mjeffryes mjeffryes modified the milestones: 0.110, 0.111 Oct 2, 2024
t0yv0 added a commit that referenced this issue Oct 7, 2024
Perform automatic rewrites over the schema to remove dangling type
references and produce a minimal schema for tool consumption that is
distinct from the real schema fed into SDK generation. The "minimal"
schema has these properties:

- it has no dangling type references
- it accurately describes the provider's SDKs for Python, Go, Java, C#
and YAML
- it approximately describes the provider's Node SDK behavior

Specifically for Node, the minimal schema may say that a string is
accepted where string + concrete type union can actually be used in
Node.
   
By default the minimal schema is not load bearing. It is possible to
activate it by setting an environment variable:

```
PULUMI_AWS_MINIMAL_SCHEMA=true
```

With this set, `pulumi-resource-aws` will be serving the minimal schema
from the `GetSchema()` gRPC method.

Also it is possible to generate SDKs based off the minimal schema.
Currently this generates exact same SDKs as the normal process for
Python, Go, .NET and Java, for example:

```
PULUMI_AWS_LIGHT_SCHEMA=true make tfgen_build_only build_python
```

Generating Node SDK off the minimal schema produces slight changes.
These may break user programs and we do not want to ship this as the
main schema just yet. We intend to reconcile them and remove the
distinct minimal schema at the next major version of the provider.

Re: #2565
@t0yv0
Copy link
Member

t0yv0 commented Oct 9, 2024

Since https://github.com/pulumi/pulumi-aws/releases/tag/v6.55.0 the provider includes a feature to serve a schema without any invalid references (see #4587). This is currently under a feature flag. We are unable to make this the official schema yet until the next major release because it involves taking breaking changes in the Node JS SDKs. Please take a look and let us know if this helps your use cases.

@zbuchheit
Copy link

@t0yv0 does this work with pulumi package get-schema or could we add support for it?

@t0yv0
Copy link
Member

t0yv0 commented Oct 10, 2024

I was expecting this to work but verifying on latest it seems that it is not working properly. I'll have closer look.

PULUMI_AWS_MINIMAL_SCHEMA=true
anton@anton-mbp-m3> pulumi package get-schema aws  | shasum   
41c5c74551c17d67580e8fb1a4033c8c8bd23fcf  -
anton@anton-mbp-m3> PULUMI_AWS_MINIMAL_SCHEMA=false                                         
anton@anton-mbp-m3> pulumi package get-schema aws  | shasum                                  
41c5c74551c17d67580e8fb1a4033c8c8bd23fcf  -

@zbuchheit
Copy link

let me know if it would be helpful if I opened a separate issue tracking that feature

@t0yv0
Copy link
Member

t0yv0 commented Oct 14, 2024

That won't be necessary. Looks like I got the tag references wrong, apologies, #4587 did not make it into the 6.55.0 release yet. We will be releasing with the next minor update (building that now #4638) and I will double check that get-schema works.

@mikhailshilkov
Copy link
Member

#4587 got shipped in 6.56.0

@t0yv0
Copy link
Member

t0yv0 commented Oct 16, 2024

We're preparing another release which will also include #4640 , I'll comment here once I've confirmed this is working.

@t0yv0
Copy link
Member

t0yv0 commented Oct 17, 2024

With the v6.56.1 release pulumi package get-schema aws respects PULUMI_AWS_MINIMAL_SCHEMA=true, please have a look and let us know if this is helpful. CC @zbuchheit

@t0yv0
Copy link
Member

t0yv0 commented Nov 12, 2024

Closing per conversation with @zbuchheit - thanks everyone.

@t0yv0 t0yv0 closed this as completed Nov 12, 2024
@mjeffryes mjeffryes modified the milestones: 0.112, 0.113 Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/providers impact/quality kind/engineering Work that is not visible to an external user kind/enhancement Improvements or new features
Projects
None yet
Development

No branches or pull requests

6 participants