Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for accented characters in data type "Name" #42

Closed
4 tasks
NicoDuvenage opened this issue Mar 18, 2020 · 6 comments
Closed
4 tasks

Support for accented characters in data type "Name" #42

NicoDuvenage opened this issue Mar 18, 2020 · 6 comments
Assignees

Comments

@NicoDuvenage
Copy link
Contributor

Request:

In Section 7.2.4.1 of the API specification, the definition of the regular expression to parse a variable of the Name type states: "all Unicode32 characters are allowed". In fact, accented and non-Roman characters are rejected by the example regular expression given in Listing 14.

Artifacts:

Decision(s):

  • Actual decision made as a result of discussion

Follow-up:

  • Alternative actions

Dependencies:

Accountability:

  • Owner:

Notes:

@NicoDuvenage NicoDuvenage changed the title Support for accented characters in data type "Name" #56 Support for accented characters in data type "Name" Mar 18, 2020
@NicoDuvenage
Copy link
Contributor Author

NicoDuvenage commented Mar 18, 2020

There has been some email correspondence regarding this topic lately, under the subject:

Update regex in ML interfaces (API and outward facing services) to support accents (select Unicode characters) in names (#1261)

@elnyry-sam-k
Copy link
Member

Not exactly on email, but on the GitHub issue here: mojaloop/project#1261 (I suspect that email was just a GitHub notification)

@mjbrichards
Copy link

Just so we have it here too: here is my current thinking on what the specification should read, copied over from the GitHub issue:

_I think I'd like to suggest that there should be a reference somewhere in the API specification (preferably in a table of such versioning references) to the Unicode release level. Within that, we should use references to the Unicode General Categories that we allow or prohibit in a field of a specific type. So we might rewrite Miller's proposal to say:

"Letters, both accented and unaccented, being chosen from all code points belonging to the Letter and Decimal_Number general categories as defined in the reference version of the Unicode specification (with link to reference.) In addition, the period (.), apostrophe ('), dash (-), comma (,) and space character are permitted. Interior spaces are allowed, but no leading or trailing spaces. For the avoidance of doubt, Names may include leading digits."

We can then allow implementers to decide how best to meet these requirements._

My view is that, as Miller says elsewhere, it is much safer to use Unicode category names than to identify the content of those categories explicitly...

@mjbrichards
Copy link

Presumably the problem from a DA perspective is not the code solution, it's the tests we propose to apply to check whether a given solution meets the requirement or not...

@elnyry-sam-k
Copy link
Member

elnyry-sam-k commented Mar 18, 2020

@elnyry-sam-k
Copy link
Member

All accented characters are now supported in implementation (example issue addressing this: mojaloop/project#2358)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants