Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Billing, Take 1 #374

Merged
merged 37 commits into from
Nov 6, 2020
Merged

Billing, Take 1 #374

merged 37 commits into from
Nov 6, 2020

Conversation

sanderpick
Copy link
Member

@sanderpick sanderpick commented Oct 5, 2020

Closes #276
Addresses #373 in part (the limits are not all in place)

This PR required a number of big structural changes, which is why the diff is so huge. I've commented on the bits that are worth looking at.

Understanding this requires some knowledge of how Stripe works with recurring metered subscriptions.

  • A subscription is made of prices, which define what a product costs at different tiers and at what intervals.
  • A price can only refer to one product, but a subscription can have multiple prices.
  • There are various setting for how prices are charged.

The Hub has four products:

  • Stored data: 5 Gib free, this is a static value, i.e, it's carried from day to day.
  • Network egress: 500 Mib per day, time-based, i.e., it's reset every day
  • Instance reads: 10,000 per day, time-based, i.e., it's reset every day
  • Instance writes: 5,000 per day, time-based, i.e., it's reset every day

Stripe only cares about units of a product. I derived the current unit values by considering that we should only notify Stripe about usage at the 1 cent resolution. For example, the smallest unit of storage we have is 5Gib / 100, which costs 1 cent. The first 100 of which are free, amounting to 5Gib free. The same logic is used for the other products.

Note: Currently, this functionality is only available at the developer level. After #315, I'll extend usage billing to their users.

The life-cycle of a customer and the various states they can arrive in are important to understand.

  • A new customer is created when an account is created (dev/org)
  • That customer is auto-subscribed to the Hub usage subscription
  • The account goes about business as usual, acruing usage against their free quota
  • Once they hit the ceiling on any product price, the CLI will error and let the user know they can run hub billing portal to launch the customer portal where a payment method can be added.
  • If they do that, Stripe hits the billing gateways webhook handler with an event that we use to update our customer object
  • This unlocks the account, and they can use however much Hub as they want to pay for
  • They can also cancel the subscription in the customer portal, which we also handle via a webhook, which locks the account completely
  • The can re-create the subscription by running hub billing setup
  • In this case, stored data is not reset, but egress, reads, and writes are reset
  • A clever user may cancel and re-subscribe over and over to get more egress, reads, writes... something we can work on stopping at some point, but those are reset daily so doesn't feel super important right now
  • At any point, a user can run hub billing status to see relevant usage and status info, like...
$ hub billing status
> Account status: pay-as-you-go
> Account balance: $0.00
> Subscription status: active

                          USAGE  FREE QUOTA         START      END
  Stored data (bytes)     0      5368709100 (100%)  03-Nov-20  04-Nov-20
  Network egress (bytes)  406    429496322 (100%)   03-Nov-20  04-Nov-20
  ThreadDB reads          0      10000 (100%)       03-Nov-20  04-Nov-20
  ThreadDB writes         0      5000 (100%)        03-Nov-20  04-Nov-20
  • If an account is deleted, the customer object is retained since it may still have a balance with open invoices, etc.

Here's what the customer portal looks like. I added the logo and our color scheme.

Screen Shot 2020-11-04 at 3 22 33 PM

Screen Shot 2020-11-04 at 3 22 57 PM

@sanderpick sanderpick self-assigned this Oct 5, 2020
@sanderpick sanderpick force-pushed the sander/billing branch 2 times, most recently from 8e10b89 to 03bbfea Compare October 14, 2020 01:26
Signed-off-by: Sander Pick <[email protected]>
Signed-off-by: Sander Pick <[email protected]>
Signed-off-by: Sander Pick <[email protected]>
Signed-off-by: Sander Pick <[email protected]>
Signed-off-by: Sander Pick <[email protected]>
@sanderpick sanderpick marked this pull request as ready for review November 1, 2020 21:49
Copy link
Member Author

@sanderpick sanderpick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments to the bits that are worth looking at.

.github/workflows/publish-js-libs.yml Show resolved Hide resolved
.github/workflows/publish-js-libs.yml Show resolved Hide resolved
Makefile Show resolved Hide resolved
Makefile Show resolved Hide resolved
api/apitest/apitest.go Show resolved Hide resolved
core/usage_interceptor.go Show resolved Hide resolved
core/usage_interceptor.go Show resolved Hide resolved
core/usage_interceptor.go Show resolved Hide resolved
core/stats_handler.go Show resolved Hide resolved
core/stats_handler.go Show resolved Hide resolved
@andrewxhill
Copy link
Member

Nice! great work dude. just a couple questions

Once they hit the ceiling on any product price, the CLI will error and let the user know they can run hub billing portal to launch the customer portal where a payment method can be added.

  • Can we flag this for later enhancement to enable a less abrupt outage. I've seen plans start accruing a bill and giving the user 1-week to enable billing and pay or then error the APIs. This would allow a free app to gracefully upgrade if they didn't expect to become popular over night and were using some API that could get blocked. So hit ceiling price -> enter grace period -> pay or get blocked.

In this case, stored data is not reset, but egress, reads, and writes are reset

  • What's your thinking for stored data no longer paid/maintained longer-term?

@sanderpick
Copy link
Member Author

Once they hit the ceiling on any product price, the CLI will error and let the user know they can run hub billing portal to launch the customer portal where a payment method can be added.

Can we flag this for later enhancement to enable a less abrupt outage. I've seen plans start accruing a bill and giving the user 1-week to enable billing and pay or then error the APIs. This would allow a free app to gracefully upgrade if they didn't expect to become popular over night and were using some API that could get blocked. So hit ceiling price -> enter grace period -> pay or get blocked.

Nice, that sounds good. Should be relatively quick to add that feature. I'll make it an issue.

In this case, stored data is not reset, but egress, reads, and writes are reset

What's your thinking for stored data no longer paid/maintained longer-term?

This just means we still account for stored data since that would be the easiest one to game. Since it's not reset, a user would not be able to add 5Gib, delete subscription, re-subscribe, add another 5 Gib.

Copy link
Member

@carsonfarmer carsonfarmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow @sanderpick, this is a biggie. I've tried to give at least a cursory glance over everything. Obviously I haven't dug deep into any of the possible edge cases etc, but your higher level comments were super helpful in assessing the overall goals. Two of my primary "concerns" (what to do about existing data, and user-scoped allocations - to come later), are pretty well addressed here (or there are concrete plans to address them.

To follow up on one of Andrew's comments though: What about the following scenario:

Developer creates account, accrues data storage up to free amount and then doesn't upgrade/create account. They leave. Now what? We don't want to hold onto that data forever right? Do we imagine a time limit on keeping it around? Maybe a notification that it will be cleaned out? In theory, they might re-create their account (which you've covered), and would expect their data to still be there for a time.

api/billingd/service/prices.go Show resolved Hide resolved
api/billingd/service/service.go Show resolved Hide resolved
api/billingd/service/service.go Show resolved Hide resolved
"buckets_pb_service.js",
"buckets_pb.d.ts",
"buckets_pb_service.d.ts"
"api/bucketsd/pb/bucketsd_pb.js",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this will require changes to the downstream consumers of these APIs. Easy update. One "fix" might be to include an index.js file in the main package folder that makes the exports a bit more ergonomic. I might create a ticket for this, as it is certainly not urgent, and will only affect "us".

cmd/buck/cli/cli.go Outdated Show resolved Hide resolved
core/stats_handler.go Show resolved Hide resolved
core/stats_handler.go Show resolved Hide resolved
core/thread_interceptor.go Show resolved Hide resolved
core/usage_interceptor.go Show resolved Hide resolved
mongodb/accounts.go Show resolved Hide resolved
Copy link
Member

@asutula asutula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, not sure what to say other than this is a huge amount of really great work. The high level design and thinking is sound, and the low level details are too many and too complex to have any real feedback, so let's get this thing deployed and see what happens! Seriously awesome work.

api/billingd/service/service.go Show resolved Hide resolved
api/hubd/pb/hubd.proto Show resolved Hide resolved
core/auth_interceptor.go Show resolved Hide resolved
Copy link
Contributor

@jsign jsign left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tons of work! 🎉
Left some comments that you might be interested.

api/billingd/gateway/gateway.go Show resolved Hide resolved
log.Debugf("unhandled event type: %s\n", event.Type)
}

c.Status(http.StatusOK)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanderpick , do you know what happen if we have some downtime and Stripe calls the webhook?
Will they retry?

If they do retries that sounds good, but we should also take care that we might receive retries (two or more times). If Stripe retries because they had a network problem, we might receive the same webhook call more than once.

Copy link
Member Author

@sanderpick sanderpick Nov 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, they do retry:

In live mode, Stripe attempts to deliver your webhooks for up to three days with an exponential back off. In the Events section of the Dashboard, you can view when the next retry will occur.

More info here. They also have this event monitoring tool that looks like it could be handy at some point.

Our event handling logic is idempotent, so duplicate events should not be a problem.

api/billingd/service/service.go Show resolved Hide resolved
if ok && owner.StorageAvailable != -1 {
owner.StorageUsed += delta
owner.StorageAvailable -= delta
owner.StorageDelta += delta
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to make sense of both +=. StorageDelta is the cumulative delta of the API, and StorageUsed the cumulative from "history"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, exactly. StorageUsed is their total used storage. The StorageDelta is the diff from the current request.

api/bucketsd/service.go Show resolved Hide resolved
cmd/buck/cli/push.go Outdated Show resolved Hide resolved
var isDB bool
var err error
switch method {
case "/threads.pb.API/NewDB":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrleated of this PR, idea: we rely on gRPC APIs as strings. If we somewhat rename a gRPC identifier, many of our things will fail silently. Not only here, but other files and also other of our projects.

I'm not aware if there's a dynamic way to get the name using the generated code of the client.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, pretty fragile. Curious if there's a best practice for this kind of method matching. Will look around.

@sanderpick
Copy link
Member Author

Developer creates account, accrues data storage up to free amount and then doesn't upgrade/create account. They leave. Now what? We don't want to hold onto that data forever right? Do we imagine a time limit on keeping it around? Maybe a notification that it will be cleaned out? In theory, they might re-create their account (which you've covered), and would expect their data to still be there for a time.

@carsonfarmer Good question. At some point we could track account activity and if it goes stale for a year or so, we send an email saying we'll delete the account.

@sanderpick
Copy link
Member Author

Wow, not sure what to say other than this is a huge amount of really great work. The high level design and thinking is sound, and the low level details are too many and too complex to have any real feedback, so let's get this thing deployed and see what happens! Seriously awesome work.

Thanks! Took a good bit of back and forth... it will nice to try it out and see what needs tweaking.

Signed-off-by: Sander Pick <[email protected]>
@sanderpick sanderpick merged commit 8be3712 into master Nov 6, 2020
@sanderpick sanderpick deleted the sander/billing branch November 6, 2020 04:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Re-consider user quotas
5 participants