-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update regex in ML interfaces (API and outward facing services) to support accents (select Unicode characters) in names #1261
Comments
@rmothilal, thanks for testing that RE. However, for the code, we should use numeric range values for accented and non-latin characters. Within the CCB, @MichaelJBRichards is hosting a thread for how to specify what characters are permitted in names by the API specification. The specification might take the form of an RE, but it is not intended that the spec be actual "code" that we can place in our system. We'll still have to choose the right form for an RE (or other implementation) that adheres to the requirements of the spec. In RE's, we should be using named character classes rather than explicit characters (as used in your example). These named character classes are referenced using a standardized syntax and can extend to the subset of the Unicode 5 (or later) glyph space required in the API specification. Code may require certain parameters to be adjusted to activate these character classes. The example you give on the thread is limited to only a few accented latin characters rather than the broader Unicode space. Here is the suggestion we are floating at present: ^(?!\s*$)[\p{L}\p{Nd} .,'-]{1,128}$ And this uses the named character classes which are supported in Node JS 10.0.0 and later. See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp You may need to use the literal RE syntax or else call the RegExp() constructor and pass a string version of the RE to get an executable. The floated suggestion here does not however ignore leading or trailing blanks which should be peeled off Name Format values before they are stored, compared, or matched using this RE. |
The RE supplied in the story by Sam works fine in JavaScript and Node JS as of 10.0.0. The one you are suggesting here is not appropriate for our code as it depends on knowledge of the code points and ranges in Unicode. It also excludes the future use of non-Latin character symbols used in e.g. Myanmar.
I think this is a work in process.
For now, we should use the RE proposal in the story, and also strip leading and trailing blanks before storing, comparing, or matching Name Format values. There are other problems with this specification for Name Format that we’ll have to address with the CCB.
— Miller
… On Mar 17, 2020, at 7:36 AM, Rajiv Mothilal ***@***.***> wrote:
This regex seems to work for javascript as the above works with other languages
^(?!\s*$)[a-zA-ZÀ-ÖØ-öø-ɏ .,'-]{1,128}$
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#1261 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB6OJ6H3QBNG4V7G5CIKFDDRH6DHHANCNFSM4LFCRBGQ>.
|
Thanks Miller. We'd like to use the RE here, updated for the leading & trailing spaces (along with using a word character as the starting character - which you indicated elsewhere), and iterate over it with implementation & QA to reach a satisfactory conclusion.. |
Thanks Sam(@elnyry). As Henrik notes in the thread on the CCB, requiring the first letter to be a word character might not be as intended. I concede that Name Format values like "123 Café" are quite reasonable. I'd start with the RE you propose, which aligns to the CCB change request, and as you say, iterate with code support until it gets the expected results. Here is Node v12:
|
Here is an example of compressing repeating strings of punctuation to a single space. And trimming leading and trailing spaces and punctuation: > RE=/[ -.,']+/g
/[ -.,']+/g
> aName = "--123 O'Leary - a Café "
"--123 O'Leary - a Café "
> aName.replace(RE, ' ').trim();
'123 O Leary a Café' We should discuss the proper canonical form for comparisons. I think this is a starting point. |
Our current solution for this issue uses the OpenAPI-Backend library (https://github.com/anttiviljami/openapi-backend) the reason this was chosen was that it allowed us to edit the swagger used for validation AJV (https://ajv.js.org), we needed to updated the regex for use in Javascript with the appropriate flags as we are unable to do this with Swagger or Openapi > 3.0. |
@elnyry-sam-k is this issue not closed based on fixing this issue --> #2358 (comment)? We have recently also updated the "implementation" regex to match the Mojaloop Specification to a more exacting degree for Party Names based on the fixes to the above issue. |
I think we maybe able to close it now Miguel @mdebarros ; just need to make sure that any other services or resources using names are updated to use the latest regex expression. (for example, bulk Quotes, legacy Simulator, SDK scheme adapter) |
mojaloop/api-snippets#105 (review) <-- once this PR is released, we can update the SDK Scheme Adapter with the latest version of the api-snippets to regenerate the swagger. That will include the updated regex. @kleyow FYI |
Goal:
As an FSP
I want to update the regular expression used in ML FSPIOP API v1.0 API Swagger for Names
so that I can support parties / end-users with accents (accented characters) in their names
As a Party (end-user) in a Mojaloop system with accent characters in my name
I want my FSP to support registering names with accents
so that I can accurately register and use my name that has accents
Note:
Issue on the mojaloop-specification repo: mojaloop/mojaloop-specification#56
Tasks:
^(?!\s*$)[\p{L}\p{Nd} .,'-]{1,128}$
)Acceptance Criteria:
^(?!\s*$)[\p{L}\p{Nd} .,'-]{1,128}$
Pull Requests:
Follow-up:
Dependencies:
Accountability:
Notes:
^(?!\s*$)[\w .,'-]{1,128}$
The text was updated successfully, but these errors were encountered: