Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add field&resource descriptions to CRD manifests from scraped metadata #64

Merged
merged 2 commits into from
Aug 12, 2022
Merged

Add field&resource descriptions to CRD manifests from scraped metadata #64

merged 2 commits into from
Aug 12, 2022

Conversation

sergenyalcin
Copy link
Member

@sergenyalcin sergenyalcin commented Aug 10, 2022

Description of your changes

Fixes https://github.com/upbound/official-providers/issues/297

With this PR, we add descriptions for fields and resources. These descriptions are taken from the provider metadata that is scraped from the TF registry by @ulucinar .

I have:

  • Read and followed Crossplane's contribution process.
  • Run make reviewable to ensure this PR is ready for review.
  • Added backport release-x.y labels to auto-backport this PR if necessary.

How has this code been tested

make generate in provider-aws

Validated related descriptions for fields and crds.

@sergenyalcin sergenyalcin self-assigned this Aug 10, 2022
@@ -242,8 +242,10 @@ func (r *Resource) scrapeFieldDocs(doc *html.Node, fieldXPath string) {
if r.ArgumentDocs == nil {
r.ArgumentDocs = make(map[string]string)
}
if r.ArgumentDocs[attrName] != "" && r.ArgumentDocs[attrName] != strings.TrimSpace(docStr) {
Copy link
Member Author

@sergenyalcin sergenyalcin Aug 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this statement to not override the set value for the related field. Of course reason of this choice based on an observation and assumption.

For conflicted fields in schemas, I observed that the first ones can be more inclusive or explanatory. I did not check a lot of resource but I want to give an example:

aws_acm_certificate

When we go to the latest description in the registry (for repeated fields), we converges to observation field description or less explanatory one.

So, the best solution resolving the issue in the following comment: https://github.com/upbound/official-providers/issues/297#issuecomment-1188997453

But for now, in the scraper tool, (as far as I understand) there is not a hierarchal relationship between fields&subfields. I will also share some statistics in a different comment in this PR. So, my humble request from honorable reviewers would be to see the other comment.

As a last word, this is open for discussion. Maybe @ulucinar can has a good point for reverting this.

@@ -156,4 +156,5 @@ func (f *Field) AddToResource(g *Builder, r *resource, typeNames *TypeNames) {
}

g.comments.AddFieldComment(typeNames.ParameterTypeName, f.FieldNameCamel, f.Comment.Build())
g.comments.AddFieldComment(typeNames.ObservationTypeName, f.FieldNameCamel, f.Comment.Build())
Copy link
Member Author

@sergenyalcin sergenyalcin Aug 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this statement to pass descriptions also to observation fields. What are your thoughts?

@sergenyalcin
Copy link
Member Author

sergenyalcin commented Aug 10, 2022

Statistics about conflicted fields:

Number of Total Scraped Description: 16311
Total Conflict: 1410
Total Value Difference for Conflicted Fields: 1080

The value in the last row is more valuable than the second. Because even if the keys of the field conflict, if the values are the same, this is not actually a problem. The problem arises if the values are not same.

In addition, it would not be entirely correct to conclude that the total number of affected fields is 1080. Because in some schemes a field repeats more than two times.

So, in the worst case, our wrong documented field rate is %6.6. So I prefer to merge this PR with this state and then address the following issue later: https://github.com/upbound/official-providers/issues/297#issuecomment-1188997453

This will be very significant progress and after this PR we will have %93.4 well documented marketplace.

@sergenyalcin sergenyalcin marked this pull request as ready for review August 10, 2022 14:45
Copy link
Member

@muvaf muvaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you pointed out, the scraped metadata is not in the correct form, i.e. tree structure, for us to do look-ups using fieldpaths. My first inclination would be to fix that first and then use it instead of relying on uniqueness of the field name unless the case is rare. And with the numbers you're showing, it's not rare but definitely not worth to block this feature. 94% is simply irresistible.

So, let's move forward and open an issue to build the tree structure ASAP. Because if there is no correct representation at this lowest level, then we can't really build more features on top of it reliably and we don't know the number for other TF providers.

@sergenyalcin Good job on extracting the statistics!

pkg/registry/meta.go Show resolved Hide resolved
pkg/registry/meta.go Outdated Show resolved Hide resolved
pkg/types/builder.go Outdated Show resolved Hide resolved
Signed-off-by: Sergen Yalçın <[email protected]>
@sergenyalcin
Copy link
Member Author

https://marketplace.upbound.io/providers/upbound/sergen-demo/v0.0.1 This is a demo repository that contains the field descriptions.

@sergenyalcin sergenyalcin merged commit 8e640f3 into crossplane:main Aug 12, 2022
@sergenyalcin sergenyalcin deleted the api-doc branch August 12, 2022 10:42
@sergenyalcin
Copy link
Member Author

https://github.com/upbound/official-providers/issues/536 follow-up issue about the tree structure for schema fields in scraped metadata

@@ -31,12 +31,12 @@ resources:
]
}
argumentDocs:
analyzer_name: '- (Required) Name of the Analyzer.'
arn: '- The Amazon Resource Name (ARN) of the Analyzer.'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(ARN) here shouldn't be removed. I think we should find & replace with the following criteria:

  • Match - (...) strings.
  • The match should be the 0th element of the string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants