Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scim users get distorted between construct, post, get. #1754

Merged
merged 20 commits into from
Sep 27, 2021
Merged

Conversation

fisx
Copy link
Contributor

@fisx fisx commented Sep 10, 2021

Case handling in scim is a bit of a mess.

One of our the bigger issues was this: we parse RichInfo data in scim from a json schema that contains the key/value pairs both as a json object, and for when an ordering of keys is required, an assoc assoc array. The parser constructs the union of the unordered and the ordered map. The problem is that jsonLower (a relatively new helper function that attempts to deal with the criminally insane idea that scim json needs to be case insensitive in its object attributes. jsonLower honours this requirement, but of course does not go through all the string values in richFieldType in the assoc list that correspond to the object attributes in the map. Fixed eg. here (calling the smart constructor that eliminates duplicates rather than the ADT constructor).

To make things more fun, there is this issue with lower-casing in haskell, which we fix here by running all lower-casers in sequence, and overall by using case-insensitive consistently over the various lower-casing functions in text or base.

This PR also introduces a function normalizeLikeStored that normalizes Scim users. It is used a lot in tests, but also to make sure that the spar responses are a bit more "normal".

I also add some new tests, and increase the entropy when creating scim data in /services/spar/test-integration/Util/Scim.hs (because I ran into collisions there).

Sorry, this isn't very coherent. Maybe the changes will make more sense than this attempt at summarizing them? You can skim through the commit history, but I don't recommend reading them in order for a review.

Checklist

  • The PR Title explains the impact of the change.
  • The PR description provides context as to why the change should occur and what the code contributes to that effect. This could also be a link to a JIRA ticket or a Github issue, if there is one.
  • changelog.d contains the following bits of information:
    • A file with the changelog entry in one or more suitable sub-sections. The sub-sections are marked by directories inside changelog.d.

@fisx fisx changed the title Scim users get distorted between cosntruct, post, get. Scim users get distorted between construct, post, get. Sep 10, 2021
@fisx fisx force-pushed the various-fixes-3 branch 2 times, most recently from d862722 to 7eb487a Compare September 20, 2021 14:53
@fisx fisx changed the title Scim users get distorted between construct, post, get. Scim users get distorted between construct, post, get. [skip ci] Sep 21, 2021
@fisx fisx force-pushed the various-fixes-3 branch 2 times, most recently from ef5e267 to 9eb2355 Compare September 24, 2021 07:20
@fisx fisx marked this pull request as ready for review September 24, 2021 19:25
@fisx fisx changed the title Scim users get distorted between construct, post, get. [skip ci] Scim users get distorted between construct, post, get. Sep 24, 2021
@fisx fisx requested a review from pcapriotti September 24, 2021 19:40
I don't remember why I did this, but I think the reason has
evaporated.  Now it seems quite silly.
@fisx
Copy link
Contributor Author

fisx commented Sep 24, 2021

failure on concourse:

when service can be discovered and the URLs change:             FAIL
>         test/unit/Test/Brig/Calling.hs:169:
>         servers should get overwritten
>         expected: Discovered (SFTServers (SrvEntry {srvPriority = 0, srvWeight = 0, srvTarget = SrvTarget {srvTargetDomain = "sft4.foo.example.com.", srvTargetPort = 443}} :| [SrvEntry {srvPriority = 0, srvWeight = 0, srvTarget = SrvTarget {srvTargetDomain = "sft5.foo.example.com.", srvTargetPort = 443}}]))
>          but got: Discovered (SFTServers (SrvEntry {srvPriority = 0, srvWeight = 0, srvTarget = SrvTarget {srvTargetDomain = "sft1.foo.example.com.", srvTargetPort = 443}} :| [SrvEntry {srvPriority = 0, srvWeight = 0, srvTarget = SrvTarget {srvTargetDomain = "sft2.foo.example.com.", srvTargetPort = 443}},SrvEntry {srvPriority = 0, srvWeight = 0, srvTarget = SrvTarget {srvTargetDomain = "sft3.foo.example.com.", srvTargetPort = 443}}]))

failure when running tests locally:

  test-integration/Test/Spar/APISpec.hs:769:5:
  1) WireIdPAPIV2.Spar.API, DELETE /identity-providers/:idp, zuser has no team, responds with 'insufficient permissions'
       uncaught exception: ErrorCall
       Assertions failed:
        1: 202 =/= 403

       Response was:

       Response {responseStatus = Status {statusCode = 403, statusMessage = "Forbidden"}, responseVersion = HTTP/1.1, responseHeaders = [("Transfer-Encoding","chunked"),("Date","Fri, 24 Sep 2021 20:44:39 GMT"),("Server","Warp/3.3.13"),("Content-Encoding","gzip"),("Content-Type","application/json")], responseBody = Just "{\"code\":403,\"message\":\"The given phone number has been blacklisted due to suspected abuse or a complaint.\",\"label\":\"blacklisted-phone\"}", responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose}
       CallStack (from HasCallStack):
         error, called at src/Bilge/Assert.hs:89:5 in bilge-0.22.0-2K7mJKDUfGc17BkrPwuOOW:Bilge.Assert
         <!!, called at src/Bilge/Assert.hs:107:19 in bilge-0.22.0-2K7mJKDUfGc17BkrPwuOOW:Bilge.Assert
         !!!, called at test-integration/Util/Core.hs:580:3 in main:Util.Core
         createRandomPhoneUser, called at test-integration/Test/Spar/APISpec.hs:684:26 in main:Test.Spar.APISpec
         testGetPutDelete, called at test-integration/Test/Spar/APISpec.hs:769:5 in main:Test.Spar.APISpec

I predict both are flakes and won't reproduce for quite a while, but I'm not entirely confident about the latter.

@fisx
Copy link
Contributor Author

fisx commented Sep 25, 2021

I saw it one more time, then couldn't reproduce it in 20 or so other local runs, or on concourse. Can't think of any connection between this PR and that test. Hm...

normalizeRichInfoAssocListInt = nubOrdOn nubber . filter ((/= mempty) . richFieldValue)
where
-- see also: https://github.com/basvandijk/case-insensitive/issues/31
nubber = Text.toLower . Text.toCaseFold . CI.foldedCase . richFieldType
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might get away with only this:

Suggested change
nubber = Text.toLower . Text.toCaseFold . CI.foldedCase . richFieldType
nubber = richFieldType

haven't tried.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Text.toCaseFold should be exactly the same as CI.foldedCase, since CI wraps a Text here, if I'm not mistaken. And we shouldn't call both toLower and toCaseFold.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, this will trigger that thing that I think is a bug in case-insensitive again: basvandijk/case-insensitive#31 (comment)

Copy link
Contributor

@pcapriotti pcapriotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonably clean. I'm not super confident, but it looks like we are now normalising everything at every stage, so it's more likely to be correct (especially if we decide to just ignore locale-specific issues and uncommon unicode characters like Cherokee letters), although I suspect some normalisation steps are unnecessary. I left a few comments below.

@@ -41,6 +41,23 @@ data Schema
| CustomSchema Text
deriving (Show, Eq)

fakeEnumSchema :: [Schema]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe adding a comment that this is just for testing is a good idea.

-- (FUTUREWORK: The "recursively" part is a bit of a waste and could be dropped, but we would
-- have to spend more effort in making sure it is always called manually in nested parsers.)
jsonLower :: Value -> Value
jsonLower (Object o) = Object . HM.fromList . fmap lowerPair . HM.toList $ o
where
lowerPair (key, val) = (toLower key, jsonLower val)
lowerPair (key, val) = (CI.foldCase key, jsonLower val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably not ideal to use case-folded strings as JSON keys (Unicode recommends to use case-folding only for comparison). Why is this JSON normalisation still needed? I would guess that once all the case comparisons on the haskell side are done correctly, JSON could be generated just using the "original" strings. Am I thinking about this wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original idea of jsonLower was to run it in json parsers initially, so that the actual parser could rely on everything being lower-case. This was an easy way of working around the fact that json and therefore aeson is strictly case-sensitive.

So morally, this is just lower-casing Values that are about to be deconstructed. But I will double-check and add this to the haddocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double-checked, and I was right:

git grep -Hn jsonLower
libs/hscim/src/Web/Scim/Capabilities/MetaSchema.hs:84:  parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Capabilities/MetaSchema.hs:95:  parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Capabilities/MetaSchema.hs:114:  parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Class/Group.hs:63:  parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Class/Group.hs:76:  parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Schema/AuthenticationScheme.hs:70:  parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Schema/Common.hs:97:jsonLower :: Value -> Value
libs/hscim/src/Web/Scim/Schema/Common.hs:98:jsonLower (Object o) = Object . HM.fromList . fmap lowerPair . HM.toList $ o
libs/hscim/src/Web/Scim/Schema/Common.hs:100:    lowerPair (key, val) = (CI.foldCase key, jsonLower val)
libs/hscim/src/Web/Scim/Schema/Common.hs:101:jsonLower (Array x) = Array (jsonLower <$> x)
libs/hscim/src/Web/Scim/Schema/Common.hs:102:jsonLower same@(String _) = same -- (only object attributes, not all texts in the value side of objects!)
libs/hscim/src/Web/Scim/Schema/Common.hs:103:jsonLower same@(Number _) = same
libs/hscim/src/Web/Scim/Schema/Common.hs:104:jsonLower same@(Bool _) = same
libs/hscim/src/Web/Scim/Schema/Common.hs:105:jsonLower same@Null = same
libs/hscim/src/Web/Scim/Schema/ListResponse.hs:62:  parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Schema/Meta.hs:70:  parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Schema/ResourceType.hs:59:  parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Schema/User/Address.hs:39:  parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Schema/User/Certificate.hs:32:  parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Schema/User/Email.hs:47:  parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Schema/User/IM.hs:32:  parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Schema/User/Name.hs:39:  parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Schema/User/Phone.hs:32:  parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Schema/User/Photo.hs:32:  parseJSON = genericParseJSON parseOptions . jsonLower
libs/wire-api/test/unit/Test/Wire/API/User/RichInfo.hs:171:    jsonroundtrip = unsafeParse . Scim.jsonLower . Aeson.toJSON

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... aaand the haddocks already pretty much say this much. @pcapriotti if you can think of anything to add, please let me know (i can also do it in a separate PR).

normalizeRichInfoAssocListInt = nubOrdOn nubber . filter ((/= mempty) . richFieldValue)
where
-- see also: https://github.com/basvandijk/case-insensitive/issues/31
nubber = Text.toLower . Text.toCaseFold . CI.foldedCase . richFieldType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Text.toCaseFold should be exactly the same as CI.foldedCase, since CI wraps a Text here, if I'm not mistaken. And we shouldn't call both toLower and toCaseFold.

@fisx
Copy link
Contributor Author

fisx commented Sep 27, 2021

although I suspect some normalisation steps are unnecessary.

very possible! i wanted to mention that my priority was soundness, not efficiency, but forgot.

This was "needed" because hscim inconsistently used 'Text.toLower',
'CI.foldedCase' etc. throughout the code base, and since they are
behaving slightly differently, we had to make sure here to catch them
all.  Since we have normalized that, we can simplify.
@fisx
Copy link
Contributor Author

fisx commented Sep 27, 2021

hm...

    RichInfo Examples
      Empty rich info:                                                                              OK
      Old RichInfoMapAndList:                                                                       OK
      case insensitive 'richinfo':                                                                  OK
      RichInfoMapAndList as only assoc list:                                                        OK
      RichInfoMapAndList Map:                                                                       OK
      Without Old RichInfoMapAndList:                                                               OK
      wrong version:                                                                                OK
      drop empty fields:                                                                            OK
    'toRichInfoAssocList', 'fromRichInfoAssocList'
      works (counter-example of earlier bug):                                                       OK
      works (property):                                                                             FAIL (0.17s)
        *** Failed! Falsified (after 37 tests and 10 shrinks):
        RichInfoAssocList {unRichInfoAssocList = [RichField {richFieldType = "\43889", richFieldValue = "`"}]}
        RichInfoAssocList {unRichInfoAssocList = [RichField {richFieldType = "\43889", richFieldValue = "`"},RichField {richFieldType = "\5025", richFieldValue = "`"}]} /= RichInfoAssocList {unRichInfoAssocList = [RichField {richFieldType = "\43889", richFieldValue = "`"}]}
        Use --quickcheck-replay=719615 to reproduce.

looks like this doesn't work out of the box?

@fisx fisx merged commit 0eea220 into develop Sep 27, 2021
@fisx fisx deleted the various-fixes-3 branch September 27, 2021 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants