-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Optimize canSendEmailOtp #1301
Conversation
22d859e
to
c9cedb7
Compare
WHERE | ||
(expiry is NULL OR expiry >= NOW()) | ||
AND | ||
CASE WHEN email ~ '^.+@' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: compared to the previous code, this is a simpler regexp. It detects the "full email", while the previous more complicated regexp was detecting the domains. Obviously the match strategies have been swapped accordingly.
const expected = false | ||
// NOTE: This ends with gov.sg not .gov.sg (lacks a dot after the @) | ||
const emailWithoutDot = "[email protected]" | ||
MockSequelize.query.mockResolvedValueOnce([]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should still be adding the mock for [{ email: ".gov.sg" }]
- our test suite clears the db state between tests, so this test is checking for [email protected]
against an empty whitelist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait what? the tests can populate the DB? I would love indeed to run the query for real rather than arbitrarily mocking a db response!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops sorry i meant the jest mocks are reset! So we should be mocking the return of query here as [{ email: ".gov.sg" }]
That said we do have population of DB for some of our tests in integration
- we don't do this for the tests in each service which we consider unit tests
The /integration/Users.spec.ts
test suite does cover some whitelist tests though (under the /email/otp
suite), so if we want to test this against the db we could add this test there as well!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, I'll take a look at the integration tests (I could probably use some help for that 😅)
But so specifically in this case you are commenting on here, the mock does what we expect. For reference, the new query no longer return the full content of the table, it returns either:
- no match if the input matches none of the whitelist entries
OR - the first whitelist entry the input matches
These 2 distinct responses represent a boolean-ish yes/no state of whether the input is whitelisted.
And in this case, the input [email protected]
is not expecting to match .gov.sg
as a whitelist entry, so that cannot be in the mocked response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah you're right, I got the intent of the test wrong sorry! But yeah i think we can then introduce an integration test with this scenario (db contains .gov.sg
, user tried to login via [email protected]
which should not match)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the comments to make it clearer why a raw query is used: db15356
@@ -1,5 +1,5 @@ | |||
import { ResultAsync, errAsync, okAsync } from "neverthrow" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we share the before and after trace in the pr description to see the difference and also maybe if the latency improvements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we share the before and after trace in the pr description
The after traces are already there, I'll add a before trace as well.
if the latency improvements
Aye, so on this a few things:
- The current query currently runs under 2ms most of the time, so the optimization isn't all that interesting from a latency perspective to be honest, and we are not expecting a big performance boost or anything
- A dev pattern of downloading the full table content is bad generally, so the PR serves as illustration of how more complex query that are not necessarily supported by the ORM can be done. The ORM is just a tool, it should not be a reason to run inefficient queries.
- I already added in the IsomerCMS dashboard a block to monitor the performance of the old query versus the new query for comparison here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we share the before and after trace in the pr description
Done! 😄 I added a sample trace for the before in the PR description and I updated the performance section too.
So again, in the grand scheme of things, this is a very cheap query so this PR is not bringing visible impact to users. It's more to illustrate how to query the DB with good patterns :)
Also Just for reference, here is a screenshot on the variability of max run time of the current query in prod. We'll be comparing the performance of the new query against that (link to widget here)
expiry: { | ||
[Op.or]: [{ [Op.is]: null }, { [Op.gt]: new Date() }], | ||
|
||
// raw query because ORMs suck! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious we used raw query here because the ORM way was more complicated or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, so this query has 2 query patterns which are "unusual" for typical ORM use-cases:
- a
CASE WHEN
in theWHERE
clause - a where condition of the form
input like cell
(with aconcat()
call thrown in), where most of the times, queries are the other way aroundcell like input
I wasn't sure if sequelize could even do 2), but even if it could, I reckon the sequelize syntax would make the query very hard to read (likely mixing object and literal syntax in one statement ?)
In addition to that, this query does not return entities as instance of models or fetches dependencies, we just want 1 whtelist record, if the input matches any, so the ORM brings no additional value here (minus maybe one level of indirection for the table name and field name).
There's also a bit of personal preference here too I suppose. Personally I think
(expiry is NULL OR expiry > NOW())
is much easier to read than:
[Op.or]: [{ [Op.is]: null }, { [Op.gt]: new Date() }],
and that's just a small part of the more complex statement in this case.
So I was keen to show that it's OK to have efficient readable raw queries in app code. ORM is just a tool, and should not get in the way of good query patterns.
But so with all that said, If you can write the query in sequelize syntax, and it reads well, I don't mind pushing that in at all. It's your code base after all! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries Tim, agreed on the points you mentioned. I had some thoughts about mixing raw queries and ORMs, but understandability of code is more important than just have "standardised" code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the comment to better explain why the raw query was used: db15356
Admittedly the "ORMs suck" coment was tongue in cheek and clearly not very informative 😅 . So in fact I copied some of the rationale in the comments from the reply I gave you 😄
Code is released, I'll be adding some info here to monitor results. I take screenshots because these spans will soon disappear from DD: For now, p99 and max of the old request for the past ~2 weeks: old querymax (link) p99 (link) new query, we need to wait for a week or to aggregate more datamax (link) p99 (link) TODO: take screenshot for the graphs for the new query in 2 weeks. Notes: |
Problem
UserService.canSendEmailOtp()
uses a whitelist to determine if emails are allowed to receive OTPs.The whitelist contains
At the moment the code basically download the full content of the whitelist table, and does a filter and check in code locally.
It would be much better to leverage the query capability of the DB, to ask if a whitelist entry exists and get a small yes/no type result.
Solution
Replace app-level code filtering by a single DB query.
The DB query will return: either:
The PR will log the whitelist match, so we can verify if anything is wrong.
Behaviour and Results
Old query (for reference)
Sample trace
Sample trace for success
Sample trace for failure
Performance
This particularly is not very heavy, returning most of the time in less than 2 milliseconds (link to performance tracking graph here). The purpose of the PR is not to get a great performance gain but instead to 1) correct a bad pattern of fetching a large resultset to do filtering in code when that could be done in the query itself; 2) show that we can use raw queries to implement efficient queries.
But so concretely:
When no whitelist entry matches, the new query does a full table scan and returns nothing.
The previous query also did a full table scan to find active whitelist rows, but also incurred the transport cost to return the data, whether or not there was a whitelist match.
The new query is slightly heavier on the DB because it makes it do more work, but at the size of the whitelist table, it's negligible. The network transit time of the resulset should still be a net benefit in total time spent.
Overall, we expect the raw query time to be equivalent, and this can hopefully be visualized after release in this tracker in the isomer dashboard.
Important note:
With the whitelisting filtering logic done in the query rather than code, the unit tests are not representative of the behaviour any more. Great care should be spent on reviewing/testing the query, and ideally we should have synthetic tests that validate the behaviour as e2e tests (some happy path tests and some expected failure mode).
Tests
Note the traces for the query will be at this link.
isomer-postgres
with the new whitelist search query, check the log for the match (e.g.Email valid for OTP by whitelist
)[email protected]
-> should appear to succeed but not send an emailisomer-postgres
with the new whitelist search query, check the log for the error (e.g.Error occurred when attempting to login user <email>
)