Adopt the new user_id scheme for the Spanner backend #170

pjenvey · 2019-07-12T22:09:44Z

Mysql uses the "legacy" user_id (an autoincremented id from the token server). Spanner requires a non serial user_id primary key: (discussion).

Our HawkPayload hasn't stubbed these values out yet (fxa_uid/fxa_kid and we'll likely utilize hash_fxa_uid for metrics). They need to be added and the tests should be adjusted to include such values (if they don't already).

Finally Spanner should switch from legacy_id over to the new values.

pjenvey · 2019-08-14T22:30:31Z

Additionally the db_tests should somehow setup non-legacy HawkIdentifiers when testing the spanner backend

jrconlin · 2019-08-14T22:37:25Z

Is this value ever exposed? (Wondering if we could use an augmented time value or something since we may not have a central location to draw from.)

pjenvey · 2019-08-14T22:57:27Z

Assuming you mean the legacy_id: various client implementations assume an integer user_id. We need to continue providing one for them and they'd continue using it for e.g. constructing the {uid} parameter in our sync storage urls.

But underneath the Spanner impl. would use the new uuid-style id from the Hawk header for its user_id.

(We currently have a centralized location to draw the legacy ids from -- the tokenserver -- but we want the tokenserver to eventually go away)

rfk · 2019-09-20T02:07:30Z

But underneath the Spanner impl. would use the new uuid-style id from the Hawk header for its user_id.

So, there's one thing to be careful of which I'm not sure is clear from the discussion in mozilla-services/tokenserver#124, so I'm going to re-iterate it here just in case.

The tokenserver currently changes the numeric uid for a user whenever their FxA encryption key changes. Because the current storage nodes index their storage by this numeric userid, the effect is that the user's storage gets "wiped" automagically when their encryption key changes - they start writing into a different bucket on the storage node that doesn't have any of the old data in it encrypted under the old key.

There is also a background job on the tokenserver which periodically asks the storage nodes to delete data stored under numberic userids that have become obsolete.

If we switch to indexing data storage by the (stable and immutable) fxa_uid instead, we'll need to make sure that we have some other way to:

Separate data encrypted with different encryption keys into distinct storage buckets
Purge old data encrypted with old encryption keys as a background cleanup job

I don't recall if we settled on a precise scheme for how to achieve those two things in a durable-storage world. The tokenserver sends an fxa_kid field that uniquely identifies the user's current encryption key and should be helpful here.

pjenvey · 2019-09-20T18:07:12Z

We've simply combined the 2 values to act as storage's userid:

https://github.com/mozilla-services/server-syncstorage/blob/fee296f/syncstorage/storage/spanner.py#L82

We should consider including fxa_kid as a separate column (but part of the primary key) instead, the Python version's concatenating of the values into userid was mostly for the sake of expediency

rfk · 2019-09-21T01:20:54Z

Great, thanks you!

We should consider including fxa_kid as a separate column

👍; since the kid contains a leading timestamp, you should be able to sort on this column in order to detect and delete data encrypted with older keys. Issue mozilla-services/tokenserver#124 is in service of ensuring that that timestamp doesn't change if it doesn't have to, I'll cycle back to looking at the outstanding work there this week.

rfk · 2019-09-24T05:47:44Z

FWIW I left a couple of notes in mozilla-services/tokenserver#124 about how the kid field currently works and how I think it could be improved, to try to make the context there clearer.

* May want to use alternate "sync_kid" db. DSN: spanner://projects/sync-spanner-dev-225401/instances/spanner-test/databases/sync_kid * switched over index to use "fxa_id" (kid), and pass HawkParameter when possible Issue #170

pjenvey added bug Something isn't working p1 Stuff we gotta do before we ship! labels Jul 12, 2019

pjenvey mentioned this issue Aug 14, 2019

Fix the unittests with spanner pool. #193

Merged

pjenvey self-assigned this Sep 26, 2019

tublitzed assigned jrconlin and unassigned pjenvey Sep 26, 2019

tublitzed added 2 Estimate - s - This is a small change with clearly defined parameters. 5 Estimate - l - Moderately complex, will require some effort but clearly defined. and removed 2 Estimate - s - This is a small change with clearly defined parameters. labels Sep 27, 2019

jrconlin mentioned this issue Oct 1, 2019

Fix/170 #237

Merged

jrconlin closed this as completed in adb463f Oct 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adopt the new user_id scheme for the Spanner backend #170

Adopt the new user_id scheme for the Spanner backend #170

pjenvey commented Jul 12, 2019

pjenvey commented Aug 14, 2019

jrconlin commented Aug 14, 2019

pjenvey commented Aug 14, 2019 •

edited

Loading

rfk commented Sep 20, 2019

pjenvey commented Sep 20, 2019

rfk commented Sep 21, 2019

rfk commented Sep 24, 2019

Adopt the new user_id scheme for the Spanner backend #170

Adopt the new user_id scheme for the Spanner backend #170

Comments

pjenvey commented Jul 12, 2019

pjenvey commented Aug 14, 2019

jrconlin commented Aug 14, 2019

pjenvey commented Aug 14, 2019 • edited Loading

rfk commented Sep 20, 2019

pjenvey commented Sep 20, 2019

rfk commented Sep 21, 2019

rfk commented Sep 24, 2019

pjenvey commented Aug 14, 2019 •

edited

Loading