-
Notifications
You must be signed in to change notification settings - Fork 30
feat: Add multiple cert handlers for APNs #660
Conversation
Current coverage is 100% (diff: 100%)@@ master #660 diff @@
====================================
Files 45 45
Lines 9171 9222 +51
Methods 0 0
Messages 0 0
Branches 0 0
====================================
+ Hits 9171 9222 +51
Misses 0 0
Partials 0 0
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @jrconlin! This looks really good, but I think _route
has some concurrency issues.
self.log.debug("Starting APNS router...") | ||
self.ap_settings = ap_settings | ||
|
||
def register(self, uaid, router_data, *args, **kwargs): | ||
"""Validate that an APNs instance token is in the ``router_data``""" | ||
def register(self, uaid, router_data, router_token="firefox", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about calling this app_id
to match the docs, and throwing instead of having a default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed router_token
to app_id
for registration, added app_id
as a default arg for register
|
||
""" | ||
router_token = router_data["token"] | ||
self._channel = router_data["channel"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ISTM there's only one instance of APNSRouter
per app, so this will clobber self._channel
for concurrent requests (if one request comes in, then another one comes in while we're still processing the first one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved rel_channel
to messages
hash.
# TODO: Add listener for error handling. | ||
self.apns.gateway_server.register_response_listener(self._error) | ||
self.messages[now] = {"token": router_token, "payload": payload} | ||
apns_client.gateway_server.register_response_listener(self._error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, self.messages
and register_response_listener
don't seem safe for concurrent use if APNSRouter
is a singleton. time.time()
is in seconds, so we might also clobber the previous entry if we handle two requests close together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this one is intentional to track sent messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the gateway_server need to have the self._error continually registered? The response listeners for each channel can't be regg'd at setup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kit pointed out some issues that need changes, and I noticed on this read-through existing issues in how outbound messages are tracked and updated.
self._connect() | ||
self.apns = dict() | ||
self._config = router_conf | ||
for channel in self._config.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are APNS channels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the iOS team have certs assigned to the firefox channel release, so these would map to "firefox", "beta", "gecko" and whatever other channel definition they want to use (e.g. "nightly?")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could that be better expressed by a better variable name or more documentation around this code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing to rel_channel
as an abbreviation for "release channel".
:type uaid: str | ||
:param router_data: Dict containing router specific configuration info | ||
:type router_data: dict | ||
:param router_token: The channel identifier for cert info lookup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More tokens? This one is weird because you immediately say that 'router_token' is actually a 'channel_identifier'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, resolving that. There will still be one "router_token" because that's what APNs calls it.
alert=router_data.get("title", config.get('default_title', | ||
'Mozilla Push')), | ||
content_available=1, | ||
custom=custom) | ||
now = int(time.time()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This results only in second granularity, if multiple requests occur at the same second (pretty easy), then their now value will all be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since I'm collapsing key indexes with the expry, made this nanosecond based.
# TODO: Add listener for error handling. | ||
self.apns.gateway_server.register_response_listener(self._error) | ||
self.messages[now] = {"token": router_token, "payload": payload} | ||
apns_client.gateway_server.register_response_listener(self._error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this one is intentional to track sent messages.
# TODO: Add listener for error handling. | ||
self.apns.gateway_server.register_response_listener(self._error) | ||
self.messages[now] = {"token": router_token, "payload": payload} | ||
apns_client.gateway_server.register_response_listener(self._error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the gateway_server need to have the self._error continually registered? The response listeners for each channel can't be regg'd at setup?
self._base_tags) | ||
|
||
self.apns.gateway_server.send_notification(token, payload, now) | ||
apns_client.gateway_server.send_notification(router_token, payload, | ||
now) | ||
|
||
# cleanup sent messages | ||
if self.messages: | ||
for time_sent in self.messages.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that multiple threads could be iterating on this at once, which means that one of the del calls will throw a KeyError. Would probably be better to toss the message cleanup function back into the event loop so it could clean-up without worry about locking the object.
self.messages[now] = {"token": token, "payload": payload} | ||
# TODO: Add listener for error handling. | ||
self.apns.gateway_server.register_response_listener(self._error) | ||
self.messages[now] = {"token": router_token, "payload": payload} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a router_token, a channel_identifier, or a token? It's not great that its name keeps changing as it moves around to be more and more obscure in meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
router_token
is now unique and locally defined as the router_data["token"]
from above. I'm fine making it more explicit but router_data_token = router_data["token"]
feels verbose.
; ... } | ||
; e.g {"firefox": {"cert": "certs/main.cert", "key": "certs/main.key"}, "beta": {"cert": "certs/beta.cert", "key": "certs/beta.key"}} | ||
#apns_creds = | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The connection node needs this? Or is there some other reason its now in shared.ini?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consistency, mostly. The GCM/FCM definitions are located in shared. In addition, the router args are currently pulled in by main via add_shared_args()
.
055e311
to
2aa7a59
Compare
router_data = uaid_data["router_data"] | ||
# Kick the entire notification routing off to a thread | ||
return deferToThread(self._route, notification, router_data) | ||
d = deferToThread(self._route, notification, router_data) | ||
d.addCallback(self._cleanup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than cleaning up after every single message, maybe have the router creation spawn a recurring task that runs every 5 seconds that does this?
content_available=1, | ||
custom=custom) | ||
now = time.time() | ||
self.messages[now] = {"rel_channel": router_data["rel_channel"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is rel_channel
the same as app_id
, or is it the instance_id
from {“token”:{instance_id}}
? And is router_token
also the instance_id
?
self._base_tags) | ||
|
||
apns_client.gateway_server.send_notification(router_token, payload, | ||
now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant to pass the ID here, too, instead of now
?
if message["time_sent"] < now - expry: | ||
del self.messages[msg_id] | ||
return response | ||
|
||
def _error(self, err): | ||
"""Error handler""" | ||
if err['status'] == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: self.messages.pop(err['identifier'], None)
might be better; del
will throw a KeyError
if the entry was already removed from the dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That shouldn't happen now that cleanup is run off the event loop.
self.apns[rel_channel] = self._connect(self._config[rel_channel]) | ||
self.apns[rel_channel].gateway_server.register_response_listener( | ||
self._error) | ||
self._connect(router_conf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed, now that the config is a dict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep the connection in a function, even if it adds some slight overhead. There's a bit of messy history about that with the apns lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant you're calling self._connect()
for each entry in router_conf
, and then you're calling self._connect(router_conf)
. ISTM that won't work? If it does, I'd like to understand why. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, quite right. context is a good thing for me to have when I reply. I should try it more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah-HA! turns out this was a merge artifact, since it just happened again after a conflict from master. ._.
release channel name. | ||
|
||
""" | ||
# Do I still need to call this in _error? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess not. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, yeah, remember my cryptic comment about calling self._connect() before? ...
# "unclean" is used by testing to prevent unclean reactor errors. | ||
# It also prevents the local cleanup loop from running without | ||
# requiring heavy lifting. | ||
if "unclean" not in router_conf: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh. Unclean reactor errors make me sad. 😭
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, thus why I added this to just not call the reactor when we're running tests.
Great! r+ with nits fixed. |
self._config = router_conf | ||
if self._config.get('max_messages'): | ||
self._max_messages = self._config['max_messages'] | ||
del(self._config['max_messages']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
del the function vs. del the keyword?
self.apns = dict() | ||
self.messages = dict() | ||
self._config = router_conf | ||
if self._config.get('max_messages'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._max_messages = self._config.pop('max_messages', 100)
del(self._config['max_messages']) | ||
else: | ||
self._max_messages = 100 | ||
for rel_channel in self._config.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default behavior on iteration of a dict is to iterate through its keys. Usually keys() is only needed if you might mutate the dict during iteration such that you need a list first.
self.apns[rel_channel] = self._connect(self._config[rel_channel]) | ||
self.apns[rel_channel].gateway_server.register_response_listener( | ||
self._error) | ||
self._connect(router_conf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, first router_conf is assigned to self._config, then we iterate through keys in self._config (which is now router_conf minux the max_messages key) calling self._connect. But then we also call self._connect with the whole thing again? What's going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
git blend merge failure.
# It also prevents the local cleanup loop from running without | ||
# requiring heavy lifting. | ||
if "unclean" not in router_conf: | ||
reactor.callLater(10, self._cleanup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a task looping call used in main.py for functions that should be called regularly.
:param app_id: The release channel identifier for cert info lookup | ||
:type app_id: str | ||
|
||
returns a modified router_data dict to be stored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:returns:
?
f5d79b6
to
167bde7
Compare
self.apns = dict() | ||
self.messages = dict() | ||
self._config = router_conf | ||
if self._config.get('max_messages'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._max_messages = self._config.pop('max_messages', 100)
self.ap_settings.metrics.increment( | ||
"updates.client.bridge.apns.succeed", | ||
"updates.client.bridge.apns.%s.attempted" % |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI send_notification appears to never fail w/ enhanced=True (assuming the input is valid for a payload), which if I'm right makes the new separate metrics unnecessary
router_data = uaid_data["router_data"] | ||
# Kick the entire notification routing off to a thread | ||
return deferToThread(self._route, notification, router_data) | ||
d = deferToThread(self._route, notification, router_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(since I know you're touching this again) could you please restore: return deferToThread..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r+ pending coverage
This patch updates APNs handlers to accept platform based cert configurations. See `configs/autopush_shared.ini.sample`. In addition, this patch clarifies some argument references for routers (e.g. less than useful `result` is now slightly more descriptive `uaid_data`) Custom item names have been normalized to match gcm/fcm. Document errors cleaned up a bit as well. BREAKING CHANGE: the APNS configuration options have been altered, see `configs/autopush_shared.ini.sample` for new APNS configuration settings. Closes #655
@@ -566,6 +569,9 @@ def endpoint_main(sysargs=None, use_files=True): | |||
# Start the table rotation checker/updater | |||
l = task.LoopingCall(settings.update_rotating_tables) | |||
l.start(60) | |||
if settings.routers.get('apns'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to get it, a key check should be fine. "if 'apns' in ...."
""" | ||
router_token = router_data["token"] | ||
rel_channel = router_data["rel_channel"] | ||
config = self._config[rel_channel] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure hope we never, ever, change valid release channel names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do, thumbscrews. Fortunately ops has the ability to generate more certs. They'll just have to change the app_id they're sending in to use the new cert pair.
This patch updates APNs handlers to accept platform based cert
configurations. See
configs/autopush_shared.ini.sample
. In addition,this patch clarifies some argument references for routers (e.g. less
than useful
result
is now slightly more descriptiveuaid_data
)Custom item names have been normalized to match gcm/fcm.
Document errors cleaned up a bit as well.
BREAKING CHANGE: the APNS configuration options have been altered, see
configs/autopush_shared.ini.sample
for new APNS configurationsettings.
Closes #655
@bbangert @pjenvey r?
@kitcambridge (optional r?)