Update regex in ML interfaces (API and outward facing services) to support accents (select Unicode characters) in names #1261

elnyry-sam-k · 2020-03-10T16:09:07Z

Goal:

As an FSP
I want to update the regular expression used in ML FSPIOP API v1.0 API Swagger for Names
so that I can support parties / end-users with accents (accented characters) in their names

As a Party (end-user) in a Mojaloop system with accent characters in my name
I want my FSP to support registering names with accents
so that I can accurately register and use my name that has accents

Note:
Issue on the mojaloop-specification repo: mojaloop/mojaloop-specification#56

Tasks:

Acceptance Criteria:

Unit Tests pass
Integration Tests pass
Code Style & Coverage meets standards
Changes made to config (default.json) are broadcast to team and follow-up tasks added to update helm charts and other deployment config.
Accents in names (unicode characters) are allowed based on the regular expression ^(?!\s*$)[\p{L}\p{Nd} .,'-]{1,128}$

Pull Requests:

Follow-up:

N/A

Dependencies:

N/A

Accountability:

Owner: TBC
QA/Review: @elnyry

Notes:

Old regular expression: ^(?!\s*$)[\w .,'-]{1,128}$

The text was updated successfully, but these errors were encountered:

millerabel · 2020-03-17T18:09:44Z

@rmothilal, thanks for testing that RE. However, for the code, we should use numeric range values for accented and non-latin characters. Within the CCB, @MichaelJBRichards is hosting a thread for how to specify what characters are permitted in names by the API specification. The specification might take the form of an RE, but it is not intended that the spec be actual "code" that we can place in our system. We'll still have to choose the right form for an RE (or other implementation) that adheres to the requirements of the spec.

In RE's, we should be using named character classes rather than explicit characters (as used in your example). These named character classes are referenced using a standardized syntax and can extend to the subset of the Unicode 5 (or later) glyph space required in the API specification. Code may require certain parameters to be adjusted to activate these character classes.

The example you give on the thread is limited to only a few accented latin characters rather than the broader Unicode space.

Here is the suggestion we are floating at present:

^(?!\s*$)[\p{L}\p{Nd} .,'-]{1,128}$

And this uses the named character classes which are supported in Node JS 10.0.0 and later.

See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp

You may need to use the literal RE syntax or else call the RegExp() constructor and pass a string version of the RE to get an executable.

The floated suggestion here does not however ignore leading or trailing blanks which should be peeled off Name Format values before they are stored, compared, or matched using this RE.

millerabel · 2020-03-17T19:00:10Z

The RE supplied in the story by Sam works fine in JavaScript and Node JS as of 10.0.0. The one you are suggesting here is not appropriate for our code as it depends on knowledge of the code points and ranges in Unicode. It also excludes the future use of non-Latin character symbols used in e.g. Myanmar. I think this is a work in process. For now, we should use the RE proposal in the story, and also strip leading and trailing blanks before storing, comparing, or matching Name Format values. There are other problems with this specification for Name Format that we’ll have to address with the CCB. — Miller

…

On Mar 17, 2020, at 7:36 AM, Rajiv Mothilal ***@***.***> wrote: This regex seems to work for javascript as the above works with other languages ^(?!\s*$)[a-zA-ZÀ-ÖØ-öø-ɏ .,'-]{1,128}$ — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1261 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB6OJ6H3QBNG4V7G5CIKFDDRH6DHHANCNFSM4LFCRBGQ>.

elnyry-sam-k · 2020-03-17T19:41:24Z

Thanks Miller.

We'd like to use the RE here, updated for the leading & trailing spaces (along with using a word character as the starting character - which you indicated elsewhere), and iterate over it with implementation & QA to reach a satisfactory conclusion..

millerabel · 2020-03-17T22:33:24Z

Thanks Sam(@elnyry). As Henrik notes in the thread on the CCB, requiring the first letter to be a word character might not be as intended. I concede that Name Format values like "123 Café" are quite reasonable. I'd start with the RE you propose, which aligns to the CCB change request, and as you say, iterate with code support until it gets the expected results.

Here is Node v12:

> RE = /^(?!\s*$)[\p{L}\p{Nd} .,'-]{1,128}$/u
/^(?!\s*$)[\p{L}\p{Nd} .,'-]{1,128}$/u
> RE.unicode
true
> aName = 'Pacific - A café';
'Pacific - A café'
> RE.test(aName)
true
> aName += '*'
'Pacific - A café*'
> RE.test(aName)
false

% node --version
v12.16.1

millerabel · 2020-03-20T00:54:24Z

Here is an example of compressing repeating strings of punctuation to a single space. And trimming leading and trailing spaces and punctuation:

> RE=/[ -.,']+/g
/[ -.,']+/g
> aName = "--123 O'Leary - a Café  "
"--123 O'Leary - a Café  "
> aName.replace(RE, ' ').trim();
'123 O Leary a Café'

We should discuss the proper canonical form for comparisons. I think this is a starting point.

rmothilal · 2020-10-09T13:18:11Z

Our current solution for this issue uses the OpenAPI-Backend library (https://github.com/anttiviljami/openapi-backend) the reason this was chosen was that it allowed us to edit the swagger used for validation AJV (https://ajv.js.org), we needed to updated the regex for use in Javascript with the appropriate flags as we are unable to do this with Swagger or Openapi > 3.0.
We have experience some issues with regards to the documentation endpoints that were used to access the swagger on the server. The plugin hapi-swagger stopped working and therefore we are unable to get that anymore. I was in the process of investigating other libraries namely XRegExp (http://xregexp.com) but I am unfortunately leaving the Mojaloop project and cannot continue with my investigation. I believe that the limitations of Javascript built in Regex library will not allow us to use the appropriate regex for the languages thus XRegExp is suggested. Using https://www.regular-expressions.info/unicode.html could give you a guide to address the language script regex issues under the Unicode Scripts section.

mdebarros · 2021-08-13T11:53:41Z

@elnyry-sam-k is this issue not closed based on fixing this issue --> #2358 (comment)?

We have recently also updated the "implementation" regex to match the Mojaloop Specification to a more exacting degree for Party Names based on the fixes to the above issue.

elnyry-sam-k · 2021-08-13T12:05:00Z

I think we maybe able to close it now Miguel @mdebarros ; just need to make sure that any other services or resources using names are updated to use the latest regex expression. (for example, bulk Quotes, legacy Simulator, SDK scheme adapter)

mdebarros · 2021-08-19T10:01:03Z

I think we maybe able to close it now Miguel @mdebarros ; just need to make sure that any other services or resources using names are updated to use the latest regex expression. (for example, bulk Quotes, legacy Simulator, SDK scheme adapter)

mojaloop/api-snippets#105 (review) <-- once this PR is released, we can update the SDK Scheme Adapter with the latest version of the api-snippets to regenerate the swagger. That will include the updated regex.

@kleyow FYI

elnyry-sam-k added the story label Mar 10, 2020

elnyry-sam-k added this to the Sprint 9.4 milestone Mar 10, 2020

rmothilal self-assigned this Mar 17, 2020

elnyry-sam-k mentioned this issue Mar 18, 2020

Support for accented characters in data type "Name" mojaloop/design-authority-project#42

Closed

4 tasks

elnyry-sam-k modified the milestones: Sprint 9.4, Sprint 9.5 Mar 31, 2020

rmothilal mentioned this issue Apr 15, 2020

API Schema Validation PoC #1295

Closed

11 tasks

elnyry-sam-k removed this from the Sprint 9.5 milestone May 12, 2020

elnyry-sam-k added the oss-core This is an issue - story or epic related to a feature on a Mojaloop core service or related to it label Jul 14, 2020

kleyow mentioned this issue Jul 22, 2020

Fix pattern issues mojaloop/api-snippets#9

Merged

elnyry-sam-k unassigned rmothilal Nov 3, 2020

elnyry-sam-k added the technical-debt Label to mark issues/stories as a technical debt item to be resolved in future label Nov 3, 2020

mdebarros mentioned this issue Aug 19, 2021

chore: update regex for first, middle and last names mojaloop/api-snippets#105

Merged

elnyry-sam-k assigned kleyow and mdebarros Aug 19, 2021

elnyry-sam-k added this to the Sprint 15.2 milestone Aug 19, 2021

elnyry-sam-k removed the technical-debt Label to mark issues/stories as a technical debt item to be resolved in future label Aug 19, 2021

mdebarros closed this as completed Aug 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update regex in ML interfaces (API and outward facing services) to support accents (select Unicode characters) in names #1261

Update regex in ML interfaces (API and outward facing services) to support accents (select Unicode characters) in names #1261

elnyry-sam-k commented Mar 10, 2020 •

edited by rmothilal

Loading

millerabel commented Mar 17, 2020 •

edited

Loading

millerabel commented Mar 17, 2020 via email

elnyry-sam-k commented Mar 17, 2020

millerabel commented Mar 17, 2020

millerabel commented Mar 20, 2020

rmothilal commented Oct 9, 2020

mdebarros commented Aug 13, 2021 •

edited

Loading

elnyry-sam-k commented Aug 13, 2021

mdebarros commented Aug 19, 2021 •

edited

Loading

Update regex in ML interfaces (API and outward facing services) to support accents (select Unicode characters) in names #1261

Update regex in ML interfaces (API and outward facing services) to support accents (select Unicode characters) in names #1261

Comments

elnyry-sam-k commented Mar 10, 2020 • edited by rmothilal Loading

Goal:

Pull Requests:

Follow-up:

Accountability:

millerabel commented Mar 17, 2020 • edited Loading

millerabel commented Mar 17, 2020 via email

elnyry-sam-k commented Mar 17, 2020

millerabel commented Mar 17, 2020

millerabel commented Mar 20, 2020

rmothilal commented Oct 9, 2020

mdebarros commented Aug 13, 2021 • edited Loading

elnyry-sam-k commented Aug 13, 2021

mdebarros commented Aug 19, 2021 • edited Loading

elnyry-sam-k commented Mar 10, 2020 •

edited by rmothilal

Loading

millerabel commented Mar 17, 2020 •

edited

Loading

mdebarros commented Aug 13, 2021 •

edited

Loading

mdebarros commented Aug 19, 2021 •

edited

Loading