Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SOAR-0001] Improved mapping of identifiers #89

Merged
merged 11 commits into from
Aug 2, 2023
69 changes: 10 additions & 59 deletions Sources/_OpenAPIGeneratorCore/Extensions/String.swift
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,6 @@ fileprivate extension String {
}

// Only allow [a-zA-Z][a-zA-Z0-9_]*
// This is bad, is there something like percent encoding functionality but for general "allowed chars only"?

let firstCharSet: CharacterSet = .letters
let numbers: CharacterSet = .decimalDigits
let otherCharSet: CharacterSet = .alphanumerics.union(.init(charactersIn: "_"))
Expand All @@ -83,7 +81,16 @@ fileprivate extension String {
sanitizedScalars.append("_")
outScalar = scalar
} else {
outScalar = "_"
var hexString = String(scalar.value, radix: 16, uppercase: true)
if index == 0,
let firstChar = hexString.unicodeScalars.first,
!firstCharSet.contains(firstChar) {
hexString = "_\(hexString)"
}
for char in hexString.unicodeScalars {
sanitizedScalars.append(char)
}
continue
}
sanitizedScalars.append(outScalar)
}
Expand Down Expand Up @@ -153,62 +160,6 @@ fileprivate extension String {
"true",
"try",
"throws",
"__FILE__",
"__LINE__",
"__COLUMN__",
"__FUNCTION__",
"__DSO_HANDLE__",
"_",
"(",
")",
"{",
"}",
"[",
"]",
"<",
">",
".",
".",
",",
"...",
":",
";",
"=",
"@",
"#",
"&",
"->",
"`",
"\\",
"!",
"?",
"?",
"\"",
"\'",
"\"\"\"",
"#keyPath",
"#line",
"#selector",
"#file",
"#fileID",
"#filePath",
"#column",
"#function",
"#dsohandle",
"#assert",
"#sourceLocation",
"#warning",
"#error",
"#if",
"#else",
"#elseif",
"#endif",
"#available",
"#unavailable",
"#fileLiteral",
"#imageLiteral",
"#colorLiteral",
")",
"yield",
"String",
"Error",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# SOAR-0001

Encoding for Property Names

## Overview

- Proposal: SOAR-0001
- Author(s): [Denil](https://github.com/denil-ct)
- Status: **Awaiting Review**
- Issue: https://github.com/apple/swift-openapi-generator/issues/21
- Implementation:
- https://github.com/apple/swift-openapi-generator/pull/89
- Affected components:
- generator

### Introduction

The goal of this proposal is to improve the way we handle unsupported characters in property names when generating code from specs. Currently, we use a block list approach, replacing offending characters with `_` which can cause name conflicts. By encoding the offending character we create unique and valid property names. This will avoid name collisions and ensure consistent code generation.

### Motivation

The current approach for handling unsupported characters in property names is not robust and can lead to unexpected and undesirable outcomes. For example, if there are two properties, `a_b` and `a b`, with the current implementation, this will result in the same generated property `a_b` for both, which would create a conflict. It can also result in loss of information or meaning from the original specification. Therefore, we need a better solution that can handle any unsupported character in a consistent and reliable way, without compromising the quality and functionality of the code.

### Proposed solution

The proposed solution to the problem is to use hex encoding for any unsupported character in property names. Hex encoding is a simple and standard way of representing any character as a sequence of hexadecimal digits. For example, the asterisk (*) character is encoded as 2A, the space ( ) character is encoded as 20, and the slash (/) character is encoded as 2F. Hex encoding also has the added benefit of not introducing any additional special characters.

Some examples,

yaml | swift
-- | --
a b | a20b
a*b | a2Ab
ab_ | ab_
ab* | ab2A
/ab | _2Fab
Hu&J_?kin | Hu26J_3Fkin
message | message

This would mean, that for the users of the generator, a future version of the generator might produce different names that what it currently produces right now and should be ready to make those changes before upgrading to this version.

### Detailed design

The implementation for this is quite simple as you can see in https://github.com/apple/swift-openapi-generator/pull/89, we just made changes to the substitution logic where it used to substitute with `_`. We now add an additional encoding to the special character before substituting it. Contributors should be aware of this change and should review the places where they use this extension and evaluate if its suitable for them with this change.

### API stability

This is an API breaking change, as it will produce different symbol names than before. Other components such as the runtime and transports should not have any impacts.

### Future directions

The encoding strategy is open for further discussion. As a starting point, we have chosen the most simplest encoding format of hex. One of the reasons for this the hex encoding adds quite arbitrary symbols to the property name, which is not ideal. We could go towards a middle of the road approach where we have wordified versions of the special characters which we can map to. For example `a+b` can be `aplusb` or `a_plus_b` to add some kind of delimiter to specify the replaced portion.
denil-ct marked this conversation as resolved.
Show resolved Hide resolved