feat: Add percentage routing and migration status to user table #163

jrconlin · 2019-12-19T00:33:09Z

This adds a few entries into the .ini:

[tokenserver]
spanner_entry  = # Name of the spanner node name e.g.  https://spanner.example.com
spanner_node_id = # default spanner node id
migrate_new_user_percentage = # percentage of users to send to spanner

note the "percentage" is a terrible hack that just sends the first n
of every 100 users to spanner.

Closes #159

This adds a few entries into the .ini: ``` [tokenserver] spanner_entry = # Name of the spanner node name e.g. https://spanner.example.com spanner_node_id = # default spanner node id migrate_new_user_percentage = # percentage of users to send to spanner ``` *note* the "percentage" is a terrible hack that just sends the first _n_ of every 100 users to spanner. Closes #159

rfk

Thanks for picking this up @jrconlin!

My r- here is specifically due to the comment below about modifying historical assignments records, which I don't think will be acceptable.

More broadly, I'm not sure I understand what you want to achieve here with the new "migration_state" column and the way you're managing the data model (which may be because the current data-model is not well explained). It would be great to get a short prose overview of the mechanics of the approach here.

It looks like you're basing the decision of whether to move the user to spanner on the autoincrement uid column. This column is not stable for a user over time - if they reset their FxA password, then the mechanics in update_user will replace their current node-assignment record with a new one with a new uid. In the process, I believe this patch would try to keep them on the spanner node, but would reset their effective migration_state column to the empty string.

In my head, a simpler version of this could look like:

In allocate_user:
- Use random.randint(0, 100) to decide whether to assign to spanner of not; I don't think we need this to be based on anything other than pure probability do we?
- Make the decision of whether to assign to spanner at the point where it currently calls get_best_node, so that the logic more clearly reads like "assign this user to spanner if they are lucky, otherwise to the best available node".
- Pass the selected node into _CREATE_USER_RECORD and let it do its thing, rather than using a separate _MIGRATE_USER call.
In update_user:
- Don't do anything special; this method already has logic for keeping a user assigned to the same node (but with a different uid) across password resets, so the choice to send them to spanner will be sticky by default.

But it's entirely possible that this will mis-behave in some cases that I have not considered.

tokenserver/assignment/sqlnode/sql.py

tokenserver/assignment/sqlnode/migrations/versions/8440ac37978a_migration.py

tokenserver/assignment/sqlnode/sql.py

tokenserver/assignment/__init__.py

tokenserver/tests/test_memorynode.ini

tokenserver/tests/test_memorynode.py

jrconlin · 2019-12-19T04:28:51Z

Thanks Ryan,

Part of this is probably due to my not quite understanding all the interaction points, so I absolutely expect some confusion.

The "migration_state" field serves several roles in my mind. One, it indicates a record that has been sent over to the Spanner system. Currently, this would only be new users, and the code only calls this in the case where a user is first assigned, but as older users are eventually migrated over, there would need to be some state to indicate that this user is no longer stored in MySQL (I am guessing this is the historical state you're concerned about).

This is partly due to the fact that the migration work I was doing shows that transferring over users may result in less projected interruption time than locking a node, transferring the users, then routing them to the new service. Since the user identification information would remain the same in both mysql and spanner, and this data would be reflected in the data set here, it would be straightforward to honor any data deletion requests.

In any case, that transition state needs to live somewhere, which means either a table or an additional column. Since the column doesn't require an index, does not impact existing data, and is tied to the same index that is used for the users table, adding the column made the most sense to me at the time. Very happy to move that state info somewhere else if it makes more sense.

I chose to select users to send based on their ID partly to ease testing and to give some level of predictability about which users would be routed to the spanner system. I also agree that it's probably needlessly complicated since the uid for mysql is the row sequence number, which is not available until after the insertion, thus resulting in the immediate MIGRATE_USER call. That same call could be used later as part of the future user migration script. Still, it's probably causing more problems than it's fixing and I'll simply opt for a random generator.

rfk · 2019-12-19T04:43:38Z

there would need to be some state to indicate that this user is no longer stored in MySQL

Sure, but the nodeid field indicates this already, but virtue of pointing to the spanner node :-)

I chose to select users to send based on their ID partly to ease testing and to give some level of
predictability about which users would be routed to the spanner system

Another option could be to use some deterministic function of the email parameter, e.g. int(md5(email).hexdigest()[:8], 16) or similar.

This is partly due to the fact that the migration work I was doing shows that transferring over users
may result in less projected interruption time than locking a node, transferring the users, then
routing them to the new service.
Since the user identification information would remain the same in both mysql and spanner,

Ah, interesting - so is it still a possibility that we'll try to move user data over to spanner on the backend rather than requiring existing users to re-upload their data on the client?

I see the attraction of trying to keep uid stable across such a migration, but I think it's likely to cause unexpected side-effects elsewhere. I'd recommend creating a new record in the users table at the time of migration.

and this data would be reflected in the data set here, it would be straightforward to honor
any data deletion requests.

To honor data deletion requests, we'd have to remember which MySQL node they were assigned to prior to the migration, but the current version of _MIGRATE_USER overwrites that with the nodeid of the spanner node.

I'm not opposed to adding a migration_state column, just be be clear. I think it would work OK to create a new record for the user at the time of migration (marking their previous record as replaced) and set migration_state only on their current record. Then update_user could propagate that new column forward across password resets etc.

That said, I reckon we could safely defer adding it until we need to use it on existing users.

* removed `migration_state` column * changed picker to work off of sha1 of the email (for repro) * normalized backend settings as much as possible. * added "universal" test

rfk

This is shaping up nicely, thanks for splitting out the migration_state column for followup work.

My "request changes" here is because of my questions around the handling of the node id for the spanner backend, and the way it currently lives in config in a way that's entirely separate from the MySQL node ids. More details in comments below, but I wonder what you think about putting the spanner node into the db as a peer of the MySQL nodes, to reduce the possibility of configuration error.

tokenserver/assignment/memorynode.py

tokenserver/assignment/__init__.py

tokenserver/assignment/sqlnode/sql.py

tokenserver/tests/test_service.py

tokenserver/assignment/sqlnode/sql.py

tokenserver/tests/test_service.py

tokenserver/assignment/sqlnode/sql.py

tokenserver/assignment/__init__.py

tokenserver/assignment/sqlnode/sql.py

tokenserver/tests/test_service.py

rfk · 2019-12-30T02:05:51Z

Is it important that we prevent existing users from being organically node-assigned to the spanner node?

tublitzed · 2019-12-30T18:15:47Z

Is it important that we prevent existing users from being organically node-assigned to the spanner node?

@rfk - We've been planning the rollout in two stages; first new users only (less risky, no data to lose), and then migrating existing users. So, ideally yes; we'd like to be able to route only new users first. I don't think it would be catastrophic if a small number of existing users were routed over, but since that's the riskiest bit here it would be good to be able to ensure we've migrated their data first to reduce the risk of them connecting on some new device that doesn't have their sync data and losing access.

rfk

Thanks for pushing through with this @jrconlin! I'd be comfortable shipping this as-is, but please also consider the potential config simplifications from #164

tokenserver/assignment/memorynode.py

tokenserver/assignment/sqlnode/sql.py

tokenserver/tests/test_service.py

* add #164 changes * move spanner migration to SQL test suite * only check migration for the migration check to prevent future issues

jrconlin

dismissing @rfk 's review since I merged in #164 and want to make sure that I didn't screw more things up.

rfk

Looks good! I left some minor nits, take or leave them as you will.

tokenserver/assignment/memorynode.py

tokenserver/assignment/sqlnode/sql.py

tokenserver/tests/assignment/test_sqlnode.py

tokenserver/tests/test_service.py

tokenserver/tests/test_sql.ini

tokenserver/tests/test_memorynode.ini

tokenserver/assignment/sqlnode/sql.py

jrconlin requested a review from pjenvey December 19, 2019 00:33

f flake8

9a0147d

jrconlin requested a review from rfk December 19, 2019 01:00

rfk suggested changes Dec 19, 2019

View reviewed changes

f r's

789bcd9

* removed `migration_state` column * changed picker to work off of sha1 of the email (for repro) * normalized backend settings as much as possible. * added "universal" test

jrconlin requested a review from rfk December 19, 2019 22:14

rfk suggested changes Dec 20, 2019

View reviewed changes

tublitzed mentioned this pull request Dec 20, 2019

Add migration state flag to data store mozilla-services/server-syncstorage#125

Closed

f normalize settings

93c795f

pjenvey requested a review from fzzzy December 23, 2019 18:48

rfk reviewed Dec 30, 2019

View reviewed changes

rfk mentioned this pull request Dec 30, 2019

Some suggestions for simplifying config handling in #163 #164

Closed

f r's

6cc4547

jrconlin force-pushed the feat/159 branch from f61aab3 to 6cc4547 Compare December 30, 2019 16:59

jrconlin requested a review from rfk December 30, 2019 18:06

rfk approved these changes Dec 30, 2019

View reviewed changes

tokenserver/assignment/memorynode.py Outdated Show resolved Hide resolved

tokenserver/assignment/sqlnode/sql.py Outdated Show resolved Hide resolved

tokenserver/tests/test_service.py Outdated Show resolved Hide resolved

f r's

ab3fa7e

* add #164 changes * move spanner migration to SQL test suite * only check migration for the migration check to prevent future issues

jrconlin requested a review from rfk December 31, 2019 00:16

jrconlin commented Dec 31, 2019

View reviewed changes

rfk approved these changes Dec 31, 2019

View reviewed changes

f r's

828ff95

jrconlin merged commit 07ef6b6 into master Dec 31, 2019

jrconlin deleted the feat/159 branch January 15, 2020 17:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add percentage routing and migration status to user table #163

feat: Add percentage routing and migration status to user table #163

jrconlin commented Dec 19, 2019

rfk left a comment

jrconlin commented Dec 19, 2019

rfk commented Dec 19, 2019

rfk left a comment

rfk commented Dec 30, 2019

tublitzed commented Dec 30, 2019

rfk left a comment

jrconlin left a comment

rfk left a comment

feat: Add percentage routing and migration status to user table #163

feat: Add percentage routing and migration status to user table #163

Conversation

jrconlin commented Dec 19, 2019

rfk left a comment

Choose a reason for hiding this comment

jrconlin commented Dec 19, 2019

rfk commented Dec 19, 2019

rfk left a comment

Choose a reason for hiding this comment

rfk commented Dec 30, 2019

tublitzed commented Dec 30, 2019

rfk left a comment

Choose a reason for hiding this comment

jrconlin left a comment

Choose a reason for hiding this comment

rfk left a comment

Choose a reason for hiding this comment