-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add percentage routing and migration status to user table #163
Conversation
This adds a few entries into the .ini: ``` [tokenserver] spanner_entry = # Name of the spanner node name e.g. https://spanner.example.com spanner_node_id = # default spanner node id migrate_new_user_percentage = # percentage of users to send to spanner ``` *note* the "percentage" is a terrible hack that just sends the first _n_ of every 100 users to spanner. Closes #159
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking this up @jrconlin!
My r- here is specifically due to the comment below about modifying historical assignments records, which I don't think will be acceptable.
More broadly, I'm not sure I understand what you want to achieve here with the new "migration_state" column and the way you're managing the data model (which may be because the current data-model is not well explained). It would be great to get a short prose overview of the mechanics of the approach here.
It looks like you're basing the decision of whether to move the user to spanner on the autoincrement uid
column. This column is not stable for a user over time - if they reset their FxA password, then the mechanics in update_user
will replace their current node-assignment record with a new one with a new uid
. In the process, I believe this patch would try to keep them on the spanner node, but would reset their effective migration_state column to the empty string.
In my head, a simpler version of this could look like:
- In
allocate_user
:- Use
random.randint(0, 100)
to decide whether to assign to spanner of not; I don't think we need this to be based on anything other than pure probability do we? - Make the decision of whether to assign to spanner at the point where it currently calls
get_best_node
, so that the logic more clearly reads like "assign this user to spanner if they are lucky, otherwise to the best available node". - Pass the selected node into
_CREATE_USER_RECORD
and let it do its thing, rather than using a separate_MIGRATE_USER
call.
- Use
- In
update_user
:- Don't do anything special; this method already has logic for keeping a user assigned to the same node (but with a different
uid
) across password resets, so the choice to send them to spanner will be sticky by default.
- Don't do anything special; this method already has logic for keeping a user assigned to the same node (but with a different
But it's entirely possible that this will mis-behave in some cases that I have not considered.
tokenserver/assignment/sqlnode/migrations/versions/8440ac37978a_migration.py
Outdated
Show resolved
Hide resolved
Thanks Ryan, Part of this is probably due to my not quite understanding all the interaction points, so I absolutely expect some confusion. The "migration_state" field serves several roles in my mind. One, it indicates a record that has been sent over to the Spanner system. Currently, this would only be new users, and the code only calls this in the case where a user is first assigned, but as older users are eventually migrated over, there would need to be some state to indicate that this user is no longer stored in MySQL (I am guessing this is the historical state you're concerned about). This is partly due to the fact that the migration work I was doing shows that transferring over users may result in less projected interruption time than locking a node, transferring the users, then routing them to the new service. Since the user identification information would remain the same in both mysql and spanner, and this data would be reflected in the data set here, it would be straightforward to honor any data deletion requests. In any case, that transition state needs to live somewhere, which means either a table or an additional column. Since the column doesn't require an index, does not impact existing data, and is tied to the same index that is used for the I chose to select users to send based on their ID partly to ease testing and to give some level of predictability about which users would be routed to the spanner system. I also agree that it's probably needlessly complicated since the |
Sure, but the
Another option could be to use some deterministic function of the
Ah, interesting - so is it still a possibility that we'll try to move user data over to spanner on the backend rather than requiring existing users to re-upload their data on the client? I see the attraction of trying to keep
To honor data deletion requests, we'd have to remember which MySQL node they were assigned to prior to the migration, but the current version of I'm not opposed to adding a That said, I reckon we could safely defer adding it until we need to use it on existing users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is shaping up nicely, thanks for splitting out the migration_state
column for followup work.
My "request changes" here is because of my questions around the handling of the node id for the spanner backend, and the way it currently lives in config in a way that's entirely separate from the MySQL node ids. More details in comments below, but I wonder what you think about putting the spanner node into the db as a peer of the MySQL nodes, to reduce the possibility of configuration error.
Is it important that we prevent existing users from being organically node-assigned to the spanner node? |
@rfk - We've been planning the rollout in two stages; first new users only (less risky, no data to lose), and then migrating existing users. So, ideally yes; we'd like to be able to route only new users first. I don't think it would be catastrophic if a small number of existing users were routed over, but since that's the riskiest bit here it would be good to be able to ensure we've migrated their data first to reduce the risk of them connecting on some new device that doesn't have their sync data and losing access. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I left some minor nits, take or leave them as you will.
This adds a few entries into the .ini:
note the "percentage" is a terrible hack that just sends the first n
of every 100 users to spanner.
Closes #159