Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate "How We Solved Push Notifications at if(we)" article #5

Open
TysonAndre opened this issue Dec 27, 2019 · 2 comments
Open

Migrate "How We Solved Push Notifications at if(we)" article #5

TysonAndre opened this issue Dec 27, 2019 · 2 comments

Comments

@TysonAndre
Copy link
Contributor

I have an old draft (version 2) from @mishan - the original blog the blog post linked to no longer exists. Incompletely converted from ODT to markdown with pandoc below


How We Solved Push Notifications at if(we)

By Misha Nasledov (@mishan)

The Early Days of Push

In the early days of mobile app development, back when the first
versions of the Tagged mobile application were being developed, a very
scrappy mobile push notification system was put together. The original
push code was written in PHP without using any sort of library. It
supported GCM (Google Cloud Messaging), C2DM (prior to GCM’s existence),
and APNs (Apple Push Notification Service). We had a very lame
subscriber database -- only the most recently used device would receive
push notifications. We did not handle every exception from the service
properly, such as an unsubscribe after uninstalling the app, or follow
certain best practices. In particular, APNs is a bit more involved to
use as it requires calling a feedback service to get the result of the
push.*1

*1 = Latest APNs HTTP/2 spec obviates this.

The Search For A Better Way

We looked at various solutions as we wanted to revamp our push
notification system in order to get more out of it. We decided the best
place to start was to actually improve the push notification engine and
the interface to it. Not being particularly married to our in-house GCM
and APNs push code, we looked at various, alternative, off-the-shelf
solutions in lieu of trying to improve the old system.

We wanted a system that would let us better abstract away different push
service provider APIs. The ability to push to more than one device per
user was also something we desired. The PHP push code gave us enough
trouble with the lack of persistent sockets -- there was already a lot
of opening and closing of connections with APNs, and sending more
notifications per person meant even more connection churn. The new
system needed to use sockets efficiently, and handle errors more
gracefully.

We didn’t particularly want to go with some vendor for sending push
notifications. When dealing with users’ person data, having one less
party involved keeps our users’ information safer. A third party service
would not have offered us as tightly integrated control and flexibility.
We also had plenty of spare servers to run a service.

Uniqush

While researching potential solutions, we discovered an open source
project known as Uniqush.
It was open source, it had some users, and the source looked relatively
simple enough that we could give it a shot and work with it. The only
dependency was a persistent Redis server, which we had already set up
for another, unrelated project previously before considering Uniqush.
It’s noteworthy that the project was structured so that one could write
a different database module so that an RDBMS such as MySQL or PostgreSQL
could be used, but currently only Redis is supported.

In a nutshell, Uniqush keeps information about “push service providers”
(PSPs). PSPs are the push notification endpoints (e.g.: our Tagged
mobile application GCM endpoint). Service names identify sets of one or
more PSPs, each PSP having a unique push service type. We make these one
to one so we can work on pushes for apps independently. Uniqush also
keeps information about subscribers, which are associated with sets of
one or more devices registered with a service endpoint, in Redis. In
order for this to be useful, one has to set up their Redis server for
persistence. So long as all one’s data can all be kept in RAM,
persistence is pretty easy. We have millions of mobile users, many of
whom have more than one device, and our subscriber database is
(relatively speaking) pretty small -- about 15GB. This also solved our
shortcomings with our subscriber database without us having to write a
new way to store subscribers.

Uniqush supports the services we use (GCM and APNs) as well as ADM
(Amazon Device Messaging.) The one shortcoming the project had was that
it did not support passing JSON payloads directly through, but instead
constructed the payloads from passed-in parameters. This was an issue as
we pass custom push notification payloads to our clients that contain
data about alert counters and, for Android, a profile picture URL.
Changing the way the client processes the notifications would break
older versions of clients. We ended up changing the code that constructs
the payloads and created a way to pass raw JSON payloads (intended for a
specific device type) directly to Uniqush.

Giving Uniqush a Shot

About a year ago we first put Uniqush on a couple of VMs on production
and changed our PHP push code to try sending through Uniqush when an
experiment was enabled. If the service call to Uniqush failed, it would
fall through to the old implementation, just in case. We first tried
using Uniqush to send our GCM push notifications and it ended up working
mostly without trouble, sending about 250 push notifications per second.
There were a couple of small bugs that became evident once Uniqush was
running at production load, but they were easily fixed.

APNs proved to be a bit trickier. There’s more complexity to the
protocol, requiring asynchronous writes and reads on TCP sockets, having
to track 32-bit identifier, and the fact that Apple will close the
socket immediately instead of giving an error code when a push fails.
Uniqush’s APNs module turned out to not have a very reliable
implementation and unfortunately fell over at production load. However,
due to the pros of Uniqush, success with GCM, and overall simplicity of
the code, we kept investing in the project. We rewrote the APNs module
to use a worker pool implementation that didn’t have the race conditions
of the existing implementation.

Scaling

Currently we use Uniqush to send all of the mobile application GCM and
APNs push notifications for Tagged and hi5 at if(we). That’s about
400-500 notifications per second. Because it’s a standalone service that
has no internal knowledge of the Tagged application or any other
business logic, we can easily use it for other apps we develop.

To reach this scale, we have four 4-core 4GB hosts running the
uniqush-push instances. We currently run three Uniqush processes per VM,
though, in reality, the tier is a bit over-provisioned to handle growth
and any surge of activity. The Uniqush instances actually end up taking
a lot more queries than just the 400-500 notifications per second. We
query the Uniqush subscriber database bwefore sending a push
notification so that we can make more intelligent decisions about
whether to push to a subscriber. The mobile clients, in aggregate, send
about another 500 subscriptions per second. Overall, the tier is
handling something around 1500 queries per second.

All of these queries end up hitting Redis to obtain, modify, and/or add
subscriber information. Before embarking on this project, we had already
built a large, general purpose persistent Redis “cluster.” It is not
actually a Redis Cluster but, rather, it is a cluster of Redis shards
with consistent hashing. Uniqush uses our
fork
of Twitter’s
twemproxy in order to be able
to utilize the cluster. Our fork contains a yet-to-be-merged
patch by
@andyqzb to add Redis Sentinel support
so that failovers can be handled properly. We have two 32-core 256GB
hosts to run the Redis master and slave shards.

What’s Next?

We’ve contributed fixes and improvements we’ve made to the Uniqush
project back upstream and continue to make improvements and
contributions to the project. The ability to store other data with
individual subscriber devices such as client versions and subscription
dates has been developed but hasn’t been pushed back upstream yet as we
haven’t even really started using these attributes ourselves. It will
allow for much more intelligent application logic -- for instance, we
could send some kind of new push notification only to the devices of
subscribers with the latest application version on their device. Our
fork which may have experimental features under development that have
not been pushed upstream yet is located at
http://github.com/ifwe/uniqush-push

Uniqush has been a resounding success at if(we). A few months ago we
finally ripped out the old push notification code from our PHP (web)
codebase. Uniqush was sending all of our APNs and GCM push notifications
at full production load without issue. It made everything much simpler.
The concern of implementing and maintaining the APNs and GCM
implementations is gone. All our PHP code has to do now is deal with
constructing push notifications (more specifically, the content of the
notifications and any application-specific log) and relaying them to
Uniqush as well as telling Uniqush to subscribe and unsubscribe devices
of users. Uniqush takes care of maintaining the subscriber database,
handling errors / exceptions, and actually sending the push
notifications to Apple and Google’s servers. This ability to operate at
a more abstract level has made it easy for us to then focus on things
like creating an A/B experiment framework for push notification content
and scheduling, smarter push notification scheduling, and more
intelligent device routing for push notifications.

Acknowledgments

Thank you Nan Deng (@monnand) for
creating Uniqush! It ended up working quite well at if(we). And a big
shout-out to our colleague Tyson Andre
(@TysonAndre) for making and driving
many improvements to Uniqush.

<span id="anchor"></span>**How We Solved Push Notifications at if(we)**

By [*Misha Nasledov (@mishan)*](https://github.com/mishan)

<span id="anchor-1"></span>The Early Days of Push

In the early days of mobile app development, back when the first
versions of the Tagged mobile application were being developed, a very
scrappy mobile push notification system was put together. The original
push code was written in PHP without using any sort of library. It
supported GCM (Google Cloud Messaging), C2DM (prior to GCM’s existence),
and APNs (Apple Push Notification Service). We had a very lame
subscriber database -- only the most recently used device would receive
push notifications. We did not handle every exception from the service
properly, such as an unsubscribe after uninstalling the app, or follow
certain best practices. In particular, APNs is a bit more involved to
use as it requires calling a feedback service to get the result of the
push.\*1

\*1 = Latest APNs HTTP/2 spec obviates this.

<span id="anchor-2"></span>The Search For A Better Way

We looked at various solutions as we wanted to revamp our push
notification system in order to get more out of it. We decided the best
place to start was to actually improve the push notification engine and
the interface to it. Not being particularly married to our in-house GCM
and APNs push code, we looked at various, alternative, off-the-shelf
solutions in lieu of trying to improve the old system.

We wanted a system that would let us better abstract away different push
service provider APIs. The ability to push to more than one device per
user was also something we desired. The PHP push code gave us enough
trouble with the lack of persistent sockets -- there was already a lot
of opening and closing of connections with APNs, and sending more
notifications per person meant even more connection churn. The new
system needed to use sockets efficiently, and handle errors more
gracefully.

We didn’t particularly want to go with some vendor for sending push
notifications. When dealing with users’ person data, having one less
party involved keeps our users’ information safer. A third party service
would not have offered us as tightly integrated control and flexibility.
We also had plenty of spare servers to run a service.

<span id="anchor-3"></span>Uniqush

While researching potential solutions, we discovered an open source
project known as [*Uniqush*](https://github.com/uniqush/uniqush-push).
It was open source, it had some users, and the source looked relatively
simple enough that we could give it a shot and work with it. The only
dependency was a persistent Redis server, which we had already set up
for another, unrelated project previously before considering Uniqush.
It’s noteworthy that the project was structured so that one could write
a different database module so that an RDBMS such as MySQL or PostgreSQL
could be used, but currently only Redis is supported.

In a nutshell, Uniqush keeps information about “push service providers”
(PSPs). PSPs are the push notification endpoints (e.g.: our Tagged
mobile application GCM endpoint). Service names identify sets of one or
more PSPs, each PSP having a unique push service type. We make these one
to one so we can work on pushes for apps independently. Uniqush also
keeps information about subscribers, which are associated with sets of
one or more devices registered with a service endpoint, in Redis. In
order for this to be useful, one has to set up their Redis server for
persistence. So long as all one’s data can all be kept in RAM,
persistence is pretty easy. We have millions of mobile users, many of
whom have more than one device, and our subscriber database is
(relatively speaking) pretty small -- about 15GB. This also solved our
shortcomings with our subscriber database without us having to write a
new way to store subscribers.

Uniqush supports the services we use (GCM and APNs) as well as ADM
(Amazon Device Messaging.) The one shortcoming the project had was that
it did not support passing JSON payloads directly through, but instead
constructed the payloads from passed-in parameters. This was an issue as
we pass custom push notification payloads to our clients that contain
data about alert counters and, for Android, a profile picture URL.
Changing the way the client processes the notifications would break
older versions of clients. We ended up changing the code that constructs
the payloads and created a way to pass raw JSON payloads (intended for a
specific device type) directly to Uniqush.

<span id="anchor-4"></span>Giving Uniqush a Shot

About a year ago we first put Uniqush on a couple of VMs on production
and changed our PHP push code to try sending through Uniqush when an
experiment was enabled. If the service call to Uniqush failed, it would
fall through to the old implementation, just in case. We first tried
using Uniqush to send our GCM push notifications and it ended up working
mostly without trouble, sending about 250 push notifications per second.
There were a couple of small bugs that became evident once Uniqush was
running at production load, but they were easily fixed.

APNs proved to be a bit trickier. There’s more complexity to the
protocol, requiring asynchronous writes and reads on TCP sockets, having
to track 32-bit identifier, and the fact that Apple will close the
socket immediately instead of giving an error code when a push fails.
Uniqush’s APNs module turned out to not have a very reliable
implementation and unfortunately fell over at production load. However,
due to the pros of Uniqush, success with GCM, and overall simplicity of
the code, we kept investing in the project. We rewrote the APNs module
to use a worker pool implementation that didn’t have the race conditions
of the existing implementation.

<span id="anchor-5"></span>Scaling

Currently we use Uniqush to send all of the mobile application GCM and
APNs push notifications for Tagged and hi5 at if(we). That’s about
400-500 notifications per second. Because it’s a standalone service that
has no internal knowledge of the Tagged application or any other
business logic, we can easily use it for other apps we develop.

To reach this scale, we have four 4-core 4GB hosts running the
uniqush-push instances. We currently run three Uniqush processes per VM,
though, in reality, the tier is a bit over-provisioned to handle growth
and any surge of activity. The Uniqush instances actually end up taking
a lot more queries than just the 400-500 notifications per second. We
query the Uniqush subscriber database bwefore sending a push
notification so that we can make more intelligent decisions about
whether to push to a subscriber. The mobile clients, in aggregate, send
about another 500 subscriptions per second. Overall, the tier is
handling something around 1500 queries per second.

All of these queries end up hitting Redis to obtain, modify, and/or add
subscriber information. Before embarking on this project, we had already
built a large, general purpose persistent Redis “cluster.” It is not
actually a Redis Cluster but, rather, it is a cluster of Redis shards
with consistent hashing. Uniqush uses [*our
fork*](https://github.com/ifwe/twemproxy) of Twitter’s
[*twemproxy*](https://github.com/twitter/twemproxy) in order to be able
to utilize the cluster. Our fork contains a yet-to-be-merged
[*patch*](https://github.com/twitter/twemproxy/pull/324) by
@[*andyqzb*](https://github.com/andyqzb) to add Redis Sentinel support
so that failovers can be handled properly. We have two 32-core 256GB
hosts to run the Redis master and slave shards.

<span id="anchor-6"></span>What’s Next?

We’ve contributed fixes and improvements we’ve made to the Uniqush
project back upstream and continue to make improvements and
contributions to the project. The ability to store other data with
individual subscriber devices such as client versions and subscription
dates has been developed but hasn’t been pushed back upstream yet as we
haven’t even really started using these attributes ourselves. It will
allow for much more intelligent application logic -- for instance, we
could send some kind of new push notification only to the devices of
subscribers with the latest application version on their device. Our
fork which may have experimental features under development that have
not been pushed upstream yet is located at
[*http://github.com/ifwe/uniqush-push*](http://github.com/ifwe/uniqush-push)

Uniqush has been a resounding success at if(we). A few months ago we
finally ripped out the old push notification code from our PHP (web)
codebase. Uniqush was sending all of our APNs and GCM push notifications
at full production load without issue. It made everything much simpler.
The concern of implementing and maintaining the APNs and GCM
implementations is gone. All our PHP code has to do now is deal with
constructing push notifications (more specifically, the content of the
notifications and any application-specific log) and relaying them to
Uniqush as well as telling Uniqush to subscribe and unsubscribe devices
of users. Uniqush takes care of maintaining the subscriber database,
handling errors / exceptions, and actually sending the push
notifications to Apple and Google’s servers. This ability to operate at
a more abstract level has made it easy for us to then focus on things
like creating an A/B experiment framework for push notification content
and scheduling, smarter push notification scheduling, and more
intelligent device routing for push notifications.

<span id="anchor-7"></span>Acknowledgments

Thank you Nan Deng (@[*monnand*](https://github.com/monnand)) for
creating Uniqush! It ended up working quite well at if(we). And a big
shout-out to our colleague Tyson Andre
(@[*TysonAndre*](https://github.com/TysonAndre)) for making and driving
many improvements to Uniqush.
@mishan
Copy link
Member

mishan commented Dec 27, 2019

I have the final draft here https://misha.nasledov.com/uniqush.html

From a quick glance, it looks the same

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Dec 27, 2019

The only change I could suggest is APNS -> APNs, per Apple's own documentation. I could probably link to that - the image links link (singular) is broken for https://misha.nasledov.com/uniqush.html for me as part of the blog no longer existing, though

<p class="block-img"><img src="https://d3gqbr1mr54afg.cloudfront.net/ifwe/0d1608dff6d5caf7dcd7bb4b44c45fc171a3d030_screen-shot-2016-04-27-at-5.49.52-pm.png" alt="" width="669" height="501" /></p>

 
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants