-
Notifications
You must be signed in to change notification settings - Fork 30
feat: add flag to stop table rotation #1179
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1179 +/- ##
=======================================
Coverage 100% 100%
=======================================
Files 60 60
Lines 10079 10637 +558
=======================================
+ Hits 10079 10637 +558
Continue to review full report at Codecov.
|
This has a conflict with master branch. |
configs/autopush_shared.ini.sample
Outdated
@@ -132,3 +132,6 @@ endpoint_port = 8082 | |||
; e.g {"firefox":{"cert":"certs/main.cert","key":"certs/main.key","topic":"com.mozilla.org.Firefox","max_retry":2},"beta":{"cert":"certs/beta.cert","key":"certs/beta.key","topic":"com.mozilla.org.FirefoxBeta"}} | |||
#apns_creds = | |||
|
|||
; With TTL implemented, message table rotation is no longer required. | |||
; This flag determines if table rotation should be allowed to continue: | |||
#allow_table_rotation = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main_argparse isn't handling this yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
autopush/db.py
Outdated
"""Fetches the name of the last message table""" | ||
response = filter( | ||
lambda name: name.lower().startswith(prefix), | ||
self._resource.meta.client.list_tables().get("TableNames")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to handle list_tables' pagination (maintenance.py in the top level dir has some code for this that might be helpful)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's my understanding that list_tables will return a maximum of 100 entries per call. We rotate tables each month. Do we feel that we will be doing table rotation for 8 years?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it lists all tables for "the current account and endpoint" which is all the tables. there's > 100 on dev and stage. I've actually ran into this on dev at one point and fixed all our cases of it, before we got unlucky on stage/prod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. awesome. Yeah, that makes sense.
autopush/db.py
Outdated
@@ -1078,6 +1117,21 @@ def create_initial_message_tables(self): | |||
an entry for tomorrow, if tomorrow is a new month. | |||
|
|||
""" | |||
if not self.allow_table_rotation: | |||
tablename = self.resource.get_latest_message_tablename( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't tablename here always going to be self._message_conf.tablename? why is the get_latest_message_tablename call needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, self._message_conf.tablename
is the config option to specify the root name for the message table. Here, it's used as a mask to determine the message table from the returned list of tables.
the following table creation methods will probably not be called outside of test routines, since the table will have already been created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my fault, I misunderstood that we'd be going w/ the final dated table after toggling allow_table_rotation as False
autopush/webpush_server.py
Outdated
@@ -466,7 +471,7 @@ def process(self, command): | |||
def _check_storage(self, command): | |||
timestamp = None | |||
messages = [] | |||
message = Message(command.message_month, | |||
message = Message(self.get_month(command), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
w/ allow_table_rotation = False all of these Message constructor calls will call ddb list_tables via get_latest_message_tablename. Can't we just do that once (say in DatabaseManager)?
HelloResponse gets the current message month and passes it back and forth between the rust/python side as command.message_month. Could this month setup be isolated to there?
And also those get_latest_message_tablename calls are made w/ the default table_base_string of "message_". I'm leery of all the defaults used: e.g. stage/prod table names are along the lines of "stage.autopush.message" which wouldn't match "message_".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call on caching the latest message tablename, That can probably go into the DynamoDBResource.
as for the prefix, we already specify "--message_tablename", this is stowed as self._message_conf.tablename
and is used as the tablename_base_string=
in DatabaseManager.setup_tables()
. I'm not sure of a better parameter to use to specify it, but I'm happy to switch to something better.
autopush/webpush_server.py
Outdated
@@ -519,7 +524,9 @@ class MigrateUserCommand(ProcessorCommand): | |||
def process(self, command): | |||
# type: (MigrateUser) -> MigrateUserResponse | |||
# Get the current channels for this month | |||
message = Message(command.message_month, | |||
if not self.conf.allow_table_rotation: | |||
return MigrateUserResponse(message_month=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a run of test_rs_integration w/ allow_table_rotation = False
message_month = None looks invalid for the rust side, message_month is always a String there (not Option)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rust side still calls to python for data operations, so it should never see a TableName of None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ported over test_webpush_monthly_rotation to test_rs_integration, which will trigger this error:
https://pastebin.mozilla.org/9083770
{"Severity":5,"Timestamp":1524526849884206435,"Logger":"AutopushRust-0.1.0","EnvVersion":"2.0","Type":"autopush_rs:log","Hostname":"ubuntu","Fields":{"message":"Error for shutdown of 127.0.0.1: invalid type: null, expected a string at line 1 column 74"}}
Of course after all that whats occurs to me: do we even want to be in this code in the first place when allow_table_rotation is False? If not, then we should be disabling that flags["rotate_message_table"] = True in the HelloCommand above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear on whether table rotation is halted entirely, or if they move to a table with no month postfix. If they're moved one last time to a non-postfix table, then there'll be a period during deploy before its completed where any reconnects, their record will be invalidated (dropped per _verify_user_record in websocket.py) and they'll be reset to the latest message table that node is aware of.
I think we will need an intermediary version release/deploy cycle so that we can have a set of 'old' nodes for the final migration that are tolerant of message table names that aren't rotating based.
autopush/db.py
Outdated
@@ -995,19 +1029,22 @@ class DatabaseManager(object): | |||
|
|||
router = attrib(default=None) # type: Optional[Router] | |||
message_tables = attrib(default=Factory(list)) # type: List[str] | |||
current_msg_month = attrib(init=False) # type: Optional[str] | |||
current_msg_month = attrib(default=None, init=False) # type: Optional[str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The rest of these were re-aligned above to match, but not here?
The presumption is that they're kept at the last table that we rotate to. Once the flag is flipped, no further rotation happens. When we decide we no longer need the flag, all the rotation stuff can be stripped out, and we can even config the message table name. The flag tries to maintain as much of the rotation code as possible so that the switch can be maintained or quickly reverted in case of unexpected issues. |
decb493
to
b6f13cf
Compare
autopush/db.py
Outdated
paginator = client.get_paginator("list_tables") | ||
tables = [] | ||
for table in paginator.paginate().search( | ||
"TableNames[?contains(@,'{}')==`true`]|sort(@)[-1]".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contains? Is this going to require it as the prefix or will it find the prefix anywhere in the table name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now our table prpefix's look like this "{dev/stage/prod}.autopush.message", so it should work ok for us as Contains, but it'd prolly be safer if it required the table to start with the prefix.
autopush/db.py
Outdated
max_ttl=MAX_EXPIRY): | ||
def __init__(self, tablename=None, metrics=None, boto_resource=None, | ||
max_ttl=MAX_EXPIRY, | ||
table_base_string="message_"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the table_base_string different from the prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's not a huge difference between the two, aside from prefix
is used mostly in table rotation, where table_base_string
is not.
752a058
to
7306fde
Compare
autopush/websocket.py
Outdated
if has_connected_this_month(record): | ||
del record["last_connect"] | ||
else: | ||
record["last_connect"] = generate_last_connect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we stop updating the last_connect just because we turn off table rotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be reverted along w/ the similar webpush_server.py change and its accompanying tweak to trigger the table migration (my bad)
autopush/db.py
Outdated
@@ -995,19 +1036,23 @@ class DatabaseManager(object): | |||
|
|||
router = attrib(default=None) # type: Optional[Router] | |||
message_tables = attrib(default=Factory(list)) # type: List[str] | |||
current_msg_month = attrib(init=False) # type: Optional[str] | |||
current_msg_month = attrib(default=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this becomes None w/ allow_table_rotation = False. Instead it should be set to the latest pegged tablename (from get_latest_message_tablename). This should really be where we cache that value.
This shouldn't be None because it propagates everywhere (through all the rust Command/Responses, which right now assumes it's a String vs Option. Also the "current_month" in the router table)
autopush/webpush_server.py
Outdated
@@ -408,15 +413,16 @@ def lookup_user(self, hello): | |||
flags["check_storage"] = True | |||
|
|||
# Determine if message table rotation is needed | |||
flags["message_month"] = record["current_month"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we still need to continue migrating to the new pegged table, so revert this. my bad!
autopush/webpush_server.py
Outdated
@@ -348,6 +348,11 @@ def metrics(self): | |||
def process(self, command): | |||
raise NotImplementedError() | |||
|
|||
def get_month(self, command): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll just be passing through the db.current_msg_month so I think this should go away completely? update_rotating_tables will never change it
autopush/db.py
Outdated
boto_resource=self.resource | ||
) | ||
self.message_tables.append(tablename) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we still need the previous 2 months in message_tables, otherwise we'll be dropping older users:
https://github.com/mozilla-services/autopush/blob/a8a7d5/autopush/websocket.py#L773
c26d4bb
to
c46aced
Compare
client = self._resource.meta.client | ||
paginator = client.get_paginator("list_tables") | ||
tables = [] | ||
for table in paginator.paginate().search( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
echoing ben's previous comment, let's run this through filter(lambda name: name.startswith(prefix), tables) afterwards just to be extra sure. even tempted to re.match the full pattern..
autopush/db.py
Outdated
@@ -479,7 +524,8 @@ def __getattr__(self, name): | |||
class Message(object): | |||
"""Create a Message table abstraction on top of a DynamoDB Table object""" | |||
def __init__(self, tablename, metrics=None, boto_resource=None, | |||
max_ttl=MAX_EXPIRY): | |||
max_ttl=MAX_EXPIRY, | |||
table_base_string="message_"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kill
autopush/db.py
Outdated
@@ -1054,7 +1112,8 @@ def setup_tables(self): | |||
self.create_initial_message_tables() | |||
self._message = Message(self.current_msg_month, | |||
self.metrics, | |||
boto_resource=self.resource) | |||
boto_resource=self.resource, | |||
table_base_string=self._message_conf.tablename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kill
autopush/db.py
Outdated
previous=3 | ||
) | ||
# Create the most recent table if it's not there. | ||
tablename = tablenames[-1:][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick tablenames[-1]
autopush/db.py
Outdated
def get_latest_message_tablename(self, prefix="message"): | ||
# type: (Optional[str]) -> str # noqa | ||
"""Fetches the name of the last message table""" | ||
if not self._latest_message_table: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we only call these during initialization now, so let's not cache them here
autopush/main_argparse.py
Outdated
@@ -105,6 +105,10 @@ def add_shared_args(parser): | |||
help="AWS DynamoDB endpoint override", | |||
type=str, default=None, | |||
env_var="AWS_LOCAL_DYNAMODB") | |||
parser.add_argument('--allow_table_rotation', | |||
help="Allow monthly message table rotation", | |||
action="store_true", default=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default to False here?
Also this isn't being passed along to AutopushConfig. Annoyingly enough, this still needs a line in AutopushConfig::from_argparse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crap, yeah, forgot that.
29e840b
to
598f3b6
Compare
changed the name (and a bit of logic) to the argument. the option (it also makes the code a bit more readable rather than |
autopush/config.py
Outdated
@@ -297,6 +299,9 @@ def from_argparse(cls, ns, **kwargs): | |||
if not ns.no_aws: | |||
ami_id = get_amid() or "Unknown" | |||
|
|||
import pdb;pdb.set_trace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also helps if I merge and push changes instead of wandering off to get a sandwich. ._.
autopush/db.py
Outdated
@@ -458,6 +477,7 @@ def __init__(self, **kwargs): | |||
if "region_name" in conf: | |||
del(conf["region_name"]) | |||
self.conf = conf | |||
self._latest_message_table = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kill
autopush/tests/test_db.py
Outdated
fake_conf.message_table.tablename = "message_bogus" | ||
fake_conf.message_table.read_throughput = 5 | ||
fake_conf.message_table.write_throughput = 5 | ||
safe = autopush.tests.boto_resource._latest_message_table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kill
autopush/tests/test_db.py
Outdated
fake_conf, | ||
resource=autopush.tests.boto_resource) | ||
try: | ||
autopush.tests.boto_resource._latest_message_table = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kill
Closes #1172