Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update regex in ML interfaces (API and outward facing services) to support accents (select Unicode characters) in names #1261

Closed
14 of 30 tasks
elnyry-sam-k opened this issue Mar 10, 2020 · 9 comments
Assignees
Labels
oss-core This is an issue - story or epic related to a feature on a Mojaloop core service or related to it story
Milestone

Comments

@elnyry-sam-k
Copy link
Member

elnyry-sam-k commented Mar 10, 2020

Goal:

As an FSP
I want to update the regular expression used in ML FSPIOP API v1.0 API Swagger for Names
so that I can support parties / end-users with accents (accented characters) in their names

As a Party (end-user) in a Mojaloop system with accent characters in my name
I want my FSP to support registering names with accents
so that I can accurately register and use my name that has accents

Note:
Issue on the mojaloop-specification repo: mojaloop/mojaloop-specification#56

Tasks:

  • Implement openapi-backend library
    • Investigate the way openapi-backend works
    • Investigate AJV schema validator
    • Build custom OpenAPIBackendValidator
    • Added custom validator to existing library
    • TBD
  • Create Validation library ml-schema-validator
    • Add Quotes/BulkQuotes
    • Add Participants
    • Add Parties
    • Add Transfers
    • Update to cater for transaction service requests
    • Create Tests
  • Update the regex for Name definitions in the ML FSPIOP API Swagger used in the outward facing services for Schema validation (to ^(?!\s*$)[\p{L}\p{Nd} .,'-]{1,128}$)
    • quoting-service - remove dependency on Hapi-Open-API and add routes
    • account-lookup-service - remove dependency on Hapi-Open-API and add routes
    • transaction-requests-service - remove dependency on Hapi-Open-API and add routes
    • ml-api-adapter - Update to use ml-schema-validator library
    • legacy simulator - Update to use ml-schema-validator library
    • Review if any other services need to be updated
  • (Before merging into master) Tests to be added around names with accents (Eg: í, ű, á, ó, ü, ö, ú, ő, ű, é, Côte d'Ivoire ) and other negative scenarios that include characters that shouldn't be allowed
  • Tests run on the interim version (before released)
  • Once validated, approved and released

Acceptance Criteria:

  • Unit Tests pass
  • Integration Tests pass
  • Code Style & Coverage meets standards
  • Changes made to config (default.json) are broadcast to team and follow-up tasks added to update helm charts and other deployment config.
  • Accents in names (unicode characters) are allowed based on the regular expression ^(?!\s*$)[\p{L}\p{Nd} .,'-]{1,128}$

Pull Requests:

Follow-up:

  • N/A

Dependencies:

  • N/A

Accountability:

  • Owner: TBC
  • QA/Review: @elnyry

Notes:

  • Old regular expression: ^(?!\s*$)[\w .,'-]{1,128}$
@elnyry-sam-k elnyry-sam-k added this to the Sprint 9.4 milestone Mar 10, 2020
@rmothilal rmothilal self-assigned this Mar 17, 2020
@millerabel
Copy link
Member

millerabel commented Mar 17, 2020

@rmothilal, thanks for testing that RE. However, for the code, we should use numeric range values for accented and non-latin characters. Within the CCB, @MichaelJBRichards is hosting a thread for how to specify what characters are permitted in names by the API specification. The specification might take the form of an RE, but it is not intended that the spec be actual "code" that we can place in our system. We'll still have to choose the right form for an RE (or other implementation) that adheres to the requirements of the spec.

In RE's, we should be using named character classes rather than explicit characters (as used in your example). These named character classes are referenced using a standardized syntax and can extend to the subset of the Unicode 5 (or later) glyph space required in the API specification. Code may require certain parameters to be adjusted to activate these character classes.

The example you give on the thread is limited to only a few accented latin characters rather than the broader Unicode space.

Here is the suggestion we are floating at present:

^(?!\s*$)[\p{L}\p{Nd} .,'-]{1,128}$

And this uses the named character classes which are supported in Node JS 10.0.0 and later.

See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp

You may need to use the literal RE syntax or else call the RegExp() constructor and pass a string version of the RE to get an executable.

The floated suggestion here does not however ignore leading or trailing blanks which should be peeled off Name Format values before they are stored, compared, or matched using this RE.

@millerabel
Copy link
Member

millerabel commented Mar 17, 2020 via email

@elnyry-sam-k
Copy link
Member Author

Thanks Miller.

We'd like to use the RE here, updated for the leading & trailing spaces (along with using a word character as the starting character - which you indicated elsewhere), and iterate over it with implementation & QA to reach a satisfactory conclusion..

@millerabel
Copy link
Member

Thanks Sam(@elnyry). As Henrik notes in the thread on the CCB, requiring the first letter to be a word character might not be as intended. I concede that Name Format values like "123 Café" are quite reasonable. I'd start with the RE you propose, which aligns to the CCB change request, and as you say, iterate with code support until it gets the expected results.

Here is Node v12:

> RE = /^(?!\s*$)[\p{L}\p{Nd} .,'-]{1,128}$/u
/^(?!\s*$)[\p{L}\p{Nd} .,'-]{1,128}$/u
> RE.unicode
true
> aName = 'Pacific - A café';
'Pacific - A café'
> RE.test(aName)
true
> aName += '*'
'Pacific - A café*'
> RE.test(aName)
false
% node --version
v12.16.1

@millerabel
Copy link
Member

Here is an example of compressing repeating strings of punctuation to a single space. And trimming leading and trailing spaces and punctuation:

> RE=/[ -.,']+/g
/[ -.,']+/g
> aName = "--123 O'Leary - a Café  "
"--123 O'Leary - a Café  "
> aName.replace(RE, ' ').trim();
'123 O Leary a Café'

We should discuss the proper canonical form for comparisons. I think this is a starting point.

@elnyry-sam-k elnyry-sam-k modified the milestones: Sprint 9.4, Sprint 9.5 Mar 31, 2020
@elnyry-sam-k elnyry-sam-k removed this from the Sprint 9.5 milestone May 12, 2020
@elnyry-sam-k elnyry-sam-k added the oss-core This is an issue - story or epic related to a feature on a Mojaloop core service or related to it label Jul 14, 2020
@rmothilal
Copy link

Our current solution for this issue uses the OpenAPI-Backend library (https://github.com/anttiviljami/openapi-backend) the reason this was chosen was that it allowed us to edit the swagger used for validation AJV (https://ajv.js.org), we needed to updated the regex for use in Javascript with the appropriate flags as we are unable to do this with Swagger or Openapi > 3.0.
We have experience some issues with regards to the documentation endpoints that were used to access the swagger on the server. The plugin hapi-swagger stopped working and therefore we are unable to get that anymore. I was in the process of investigating other libraries namely XRegExp (http://xregexp.com) but I am unfortunately leaving the Mojaloop project and cannot continue with my investigation. I believe that the limitations of Javascript built in Regex library will not allow us to use the appropriate regex for the languages thus XRegExp is suggested. Using https://www.regular-expressions.info/unicode.html could give you a guide to address the language script regex issues under the Unicode Scripts section.

@elnyry-sam-k elnyry-sam-k added the technical-debt Label to mark issues/stories as a technical debt item to be resolved in future label Nov 3, 2020
@mdebarros
Copy link
Member

mdebarros commented Aug 13, 2021

@elnyry-sam-k is this issue not closed based on fixing this issue --> #2358 (comment)?

We have recently also updated the "implementation" regex to match the Mojaloop Specification to a more exacting degree for Party Names based on the fixes to the above issue.

@elnyry-sam-k
Copy link
Member Author

I think we maybe able to close it now Miguel @mdebarros ; just need to make sure that any other services or resources using names are updated to use the latest regex expression. (for example, bulk Quotes, legacy Simulator, SDK scheme adapter)

@mdebarros
Copy link
Member

mdebarros commented Aug 19, 2021

I think we maybe able to close it now Miguel @mdebarros ; just need to make sure that any other services or resources using names are updated to use the latest regex expression. (for example, bulk Quotes, legacy Simulator, SDK scheme adapter)

mojaloop/api-snippets#105 (review) <-- once this PR is released, we can update the SDK Scheme Adapter with the latest version of the api-snippets to regenerate the swagger. That will include the updated regex.

@kleyow FYI

@elnyry-sam-k elnyry-sam-k added this to the Sprint 15.2 milestone Aug 19, 2021
@elnyry-sam-k elnyry-sam-k removed the technical-debt Label to mark issues/stories as a technical debt item to be resolved in future label Aug 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oss-core This is an issue - story or epic related to a feature on a Mojaloop core service or related to it story
Projects
None yet
Development

No branches or pull requests

5 participants