Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SOAR-0001] Improved mapping of identifiers #89

Merged
merged 11 commits into from
Aug 2, 2023

Conversation

denil-ct
Copy link
Contributor

@denil-ct denil-ct commented Jun 22, 2023

Motivation

Encoding would help distinguish types properly. For eg. both a b and a_b would be rendered as a_b. This change would make it render a20b and a_b respectively.

Modifications

Added a hexadecimal encoding for un-supported characters rather than having just an underscore for everything.
Removed redundant keywords.

Result

Proper differentiation between names as explained in the motivation, along with that, it also reduces the reserved keywords set by removing symbols and any keywords that prepend such symbols.

Test Plan

Tested and verified for several combinations of special characters in various positions in-between alphanumeric characters.

@czechboy0
Copy link
Contributor

czechboy0 commented Jun 22, 2023

Hey @denil-ct, thanks for breaking this out.

I'm on the fence about this.

On the plus side, it helps disambiguate special symbols, so it even helps with #21.

On the other hand, the hex code can seem a little arbitrary.

To help us see what these will look like in practice, can you add a bunch of examples to this unit test? Like 10 different ones with various special symbols in different parts of the name, to help us see the impact. https://github.com/apple/swift-openapi-generator/blob/main/Tests/OpenAPIGeneratorCoreTests/Extensions/Test_String.swift#L19

Thanks 🙏

cc @simonjbeaumont for thoughts, I'm genuinely split here.

@denil-ct
Copy link
Contributor Author

Sure, will add some. I went with hex because I was influenced by the comment stating similar to url encoding. If the numbers seem a bit arbitrary we can explore some other form as well.

@czechboy0
Copy link
Contributor

Yeah, it's a fair starting point. @simonjbeaumont recently added a special case for empty strings (_empty), and we could even have a hard-coded map for common symbols, e.g. + -> plus, # -> pound, etc, so that the hex would only happen for rarely used symbols.

One thing to keep in mind - this is an API-breaking change, so will have to be in a minor bump, as if people update the generator, they'd get different symbol names. That's okay, just making it explicit here.

@czechboy0 czechboy0 added the 🆕 semver/minor Adds new public API. label Jun 22, 2023
@simonjbeaumont
Copy link
Collaborator

@denil-ct thanks for thinking about this one.

One thing I do like about this approach is it generalises. For replacements within a string we might also benefit from some delimiting to make it clearer it's the result of mangling.

Yeah, it's a fair starting point. @simonjbeaumont recently added a special case for empty strings (_empty), and we could even have a hard-coded map for common symbols, e.g. + -> plus, # -> pound, etc, so that the hex would only happen for rarely used symbols.

We could find a middle ground. I think that the empty string probably deserves its own special case because there's nothing else to meaningfully identify it—i.e. having just _20 is pretty uninformative.

It makes me wonder if the heuristic might be to only make a "wordy" replacement when there's nothing else other than the symbol?

This one is definitely a tradeoff and anything we do will have some drawback.

Another compromise here could be to generate the hex-encoded thing always and a type alias and or function wrapper with a nice name for a few cases, e.g. +1 would generate _2b1 and an alias _plus1. We can then have something that's uniform with some opt-in syntactic sugar?

@czechboy0
Copy link
Contributor

czechboy0 commented Jun 22, 2023

Mostly on the same page with you both, but since this will have a pretty visible impact on the naming logic, could you do a proposal (see our docs for the template)? Doesn't have to be long, just describe the overall naming logic:

  • use as-is if all characters are legal for a Swift identifier, and doesn't start with a number
  • prefixed with an underscore when starts with a number
  • if it's a known keyword, add an underscore
  • each character is checked, if it's not safe in a Swift identifier, then it's looked up in a hard-coded map (list the hard-coded characters and their word replacements), and if it's not in the map, is converted to its hex code

The above with a bunch of diverse examples of inputs and outputs.

Doing the typealias would be a bit tricky due to namespacing and uniqueness, so I'd prefer if there was exactly one Swift-safe term for an input string.

Again, this will be a breaking change, so we want to make sure we understand the full implications.

@denil-ct
Copy link
Contributor Author

Sure, that sounds good. Will create one and we'll proceed from there.

@czechboy0
Copy link
Contributor

Can you put the PR into a draft state (or close it) until we have a proposal? Just to make clear that you're not waiting on us to take any action, like a review - thanks. 🙂

@denil-ct
Copy link
Contributor Author

Ah sure, no issues.

@denil-ct denil-ct marked this pull request as draft June 22, 2023 18:31
@denil-ct denil-ct changed the title Experimental string encoding support [Proposal] Implement Encoding for Property Names Jun 25, 2023
@denil-ct denil-ct changed the title [Proposal] Implement Encoding for Property Names [Proposal] Encoding for Property Names Jun 25, 2023
@denil-ct
Copy link
Contributor Author

Hey @czechboy0 @simonjbeaumont

I have updated the PR description with the proposal. Do let me know if anything is missing, and what the next steps are. Also should I pull this out of draft state?

@czechboy0
Copy link
Contributor

Thanks for writing this up, @denil-ct! Could you add the proposal as a markdown file to the Proposals directory, next to the template? That'll allow us to comment on specific lines, thanks! You can then remove it from the PR description, to avoid having to update two places as we iterate.

@denil-ct denil-ct marked this pull request as ready for review June 26, 2023 06:00
@denil-ct
Copy link
Contributor Author

denil-ct commented Jun 27, 2023

Have added the proposed changes, along with updated examples. Looks too wordy in some cases imo.

@czechboy0
Copy link
Contributor

Great! One more thing, we just recently updated the proposal process a tiny bit, see #91

So can you split the proposal into a separate PR from the code change, please? (Thanks for helping us figure out this overall process by test-driving it 🙂)

@simonjbeaumont
Copy link
Collaborator

Great! One more thing, we just recently updated the proposal process a tiny bit, see #91

So can you split the proposal into a separate PR from the code change, please? (Thanks for helping us figure out this overall process by test-driving it 🙂)

The docs you have here (about how we handle special characters) are pretty useful to folks generally beyond those reviewing the code change. @czechboy0 maybe the proposal should be replaced by an addition to the DocC?

@denil-ct
Copy link
Contributor Author

@czechboy0 @simonjbeaumont

Should I proceed with splitting up the proposal and implementation?

@simonjbeaumont
Copy link
Collaborator

@denil-ct First off, thank you for jumping in—on this and other issues!

We'll want to proceed with a bit more caution with this change compared to the other contributions, as the choices we make here—and any subsequent changes to the area—will have an impact on the generated API.

This might mean waiting for us to have tests around the API changes in generated code, so I just wanted to manage expectations on your side.

Should I proceed with splitting up the proposal and implementation?

Yes, let's first land this as a proposal and let others chime in with any feedback before we land the implementation.

When we do land the implementation, we'll want to have some good testing around this too.

@denil-ct
Copy link
Contributor Author

denil-ct commented Jun 28, 2023

I totally understand, and there is no reason to rush, as the current solution is decent enough in many cases. Let's follow the age old measure twice, cut once approach for this.

@denil-ct
Copy link
Contributor Author

Marking as draft until the proposal is approved.

@denil-ct denil-ct marked this pull request as draft June 28, 2023 17:35
@denil-ct denil-ct changed the title [Proposal] Encoding for Property Names [SOAR-0001] Implement Encoding for Property Names Jun 28, 2023
@denil-ct
Copy link
Contributor Author

@czechboy0 please add a blocked tag here as well

@czechboy0 czechboy0 added the status/blocked Waiting for another issue. label Jul 11, 2023
@czechboy0
Copy link
Contributor

Done ✅

@czechboy0 czechboy0 removed the status/blocked Waiting for another issue. label Jul 28, 2023
@czechboy0
Copy link
Contributor

@denil-ct Ok, you're unblocked now, please provide the new implementation in this method, and add a bunch of unit tests for it please.

@czechboy0 czechboy0 added 🔨 semver/patch No public API change. and removed 🆕 semver/minor Adds new public API. labels Jul 28, 2023
@czechboy0
Copy link
Contributor

Also, changing the label from semver/minor to semver/patch, because this will be hidden behind a feature flag in 0.1.x. Only once we enable the feature flag unconditionally in 0.2.x will it be a breaking change, so we'll mark semver/minor that PR, not this one.

denil-ct added 3 commits July 31, 2023 23:38
…rator into add-string-encoding

# Conflicts:
#	Sources/_OpenAPIGeneratorCore/Extensions/String.swift
@denil-ct denil-ct force-pushed the add-string-encoding branch from fdf6aa7 to 05031b1 Compare July 31, 2023 18:34
@denil-ct denil-ct marked this pull request as ready for review August 1, 2023 20:06
Copy link
Contributor

@czechboy0 czechboy0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you!

Just a reminder, it's currently disabled behind the proposal0001 feature flag, but once it lands, anyone can test it out by enabling the feature flag (in CLI or in their config file).

The feature will be enabled unconditionally in the next breaking version, i.e. 0.2.0.

@czechboy0 czechboy0 merged commit bf49be1 into apple:main Aug 2, 2023
@denil-ct denil-ct deleted the add-string-encoding branch August 2, 2023 06:08
czechboy0 pushed a commit that referenced this pull request Aug 2, 2023
### Motivation

Encoding a string would help distinguish types properly. For eg. both a
b and a_b would be rendered as a_b. This change would make it render
a_space_b and a_b respectively. Adding a proposal as its not a trivial
change, as recommended by the contribution guidelines.

### Modifications

Added proposal document - `SOAR-0001.md`

An initial draft of the proposed implementation is at
#89. Any
suggestions or improvements for the proposal are welcome.
@czechboy0 czechboy0 changed the title [SOAR-0001] Implement Encoding for Property Names [SOAR-0001] Improved mapping of identifiers Aug 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔨 semver/patch No public API change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants