Skip to content
This repository has been archived by the owner on Nov 14, 2024. It is now read-only.

Improve Sqlite namespace loader implementation #6488

Merged
merged 8 commits into from
Apr 3, 2023
Merged

Conversation

ergo14
Copy link
Contributor

@ergo14 ergo14 commented Mar 28, 2023

General

Before this PR:
Namespaces are loaded with a select distinct query
After this PR:
Namespaces are loaded sequentially, each query find the next smallest namespace. This forces sqlite to use the more efficient search plan (as opposed to scan). This is fine because namespace cardinality should be low.
==COMMIT_MSG==
==COMMIT_MSG==

Priority:
P2
Concerns / possible downsides (what feedback would you like?):
Pending testing
Is it okay to replace the implementation as opposed to adding a new endpoint?
Is documentation needed?:
No

Compatibility

Does this PR create any API breaks (e.g. at the Java or HTTP layers) - if so, do we have compatibility?:
No
Does this PR change the persisted format of any data - if so, do we have forward and backward compatibility?:
No
The code in this PR may be part of a blue-green deploy. Can upgrades from previous versions safely coexist? (Consider restarts of blue or green nodes.):
Yes
Does this PR rely on statements being true about other products at a deployment - if so, do we have correct product dependencies on these products (or other ways of verifying that these statements are true)?:
Relies on namespace cardinality to be low
Does this PR need a schema migration?
No

Testing and Correctness

What, if any, assumptions are made about the current state of the world? If they change over time, how will we find out?:
N/A
What was existing testing like? What have you done to improve it?:
Swapped one implementation for another, existing tests should suffice
If this PR contains complex concurrent or asynchronous code, is it correct? The onus is on the PR writer to demonstrate this.:
N/A
If this PR involves acquiring locks or other shared resources, how do we ensure that these are always released?:
N/A

Execution

How would I tell this PR works in production? (Metrics, logs, etc.):
When the request is made on a busy nodes with lots of rows, TimeLock does not get sad
Has the safety of all log arguments been decided correctly?:
N/A
Will this change significantly affect our spending on metrics or logs?:
No
How would I tell that this PR does not work in production? (monitors, etc.):
When the request is made on a busy nodes with lots of rows, TimeLock gets sad
If this PR does not work as expected, how do I fix that state? Would rollback be straightforward?:
Recall and rollback
If the above plan is more complex than “recall and rollback”, please tag the support PoC here (if it is the end of the week, tag both the current and next PoC):
@jeremyk-91

Scale

Would this PR be expected to pose a risk at scale? Think of the shopping product at our largest stack.:
No more than the existing
Would this PR be expected to perform a large number of database calls, and/or expensive database calls (e.g., row range scans, concurrent CAS)?:
O(200) database calls instead of one, but the endpoint is called rarely
Would this PR ever, with time and scale, become the wrong thing to do - and if so, how would we know that we need to do something differently?:
N/A

Development Process

Where should we start reviewing?:

If this PR is in excess of 500 lines excluding versions lock-files, why does it not make sense to split it?:

Please tag any other people who should be aware of this PR:
@jeremyk-91
@sverma30

@changelog-app
Copy link

changelog-app bot commented Mar 28, 2023

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Improve Sqlite namespace loader implementation

Check the box to generate changelog(s)

  • Generate changelog entry

Comment on lines 141 to 142
@SqlQuery("SELECT MIN(namespace) FROM paxosLog")
Optional<String> getSmallestNamespace();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two queries, resisted the temptation of removing this and using "" for the first query

I think this is fine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually prefer using just one query as that's consistent with other patterns we have in Atlas - the thing you describe is pretty standard!

Comment on lines 55 to 61
private Optional<String> getSmallestNamespace() {
return jdbi.withExtension(SqlitePaxosStateLog.Queries.class, SqlitePaxosStateLog.Queries::getSmallestNamespace);
}

private Optional<String> getNextSmallestNamespace(String lastReadNamespace) {
return jdbi.withExtension(
SqlitePaxosStateLog.Queries.class, dao -> dao.getNextSmallestNamespace(lastReadNamespace));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could reduce this to one call but did not want to use an Optional as a method parameter

Comment on lines 43 to 52
Set<Client> clients = new HashSet<>();

Optional<String> currentNamespace = getSmallestNamespace();
while (currentNamespace.isPresent()) {
String namespaceString = currentNamespace.get();
clients.add(Client.of(namespaceString));
currentNamespace = getNextSmallestNamespace(namespaceString);
}

return clients;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could have used a custom inner class that implements Supplier while keeping some state

Then, the code here would probably be simpler stg like Stream.generate(...).takeWhile(..).collect(..), stg to that effect

But I think this is more human-readable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if Stream.generate(...) is simpler than what you have! Agree with your approach here

Comment on lines 144 to 145
@SqlQuery("SELECT MIN(namespace) FROM paxosLog WHERE namespace > :namespace")
Optional<String> getNextSmallestNamespace(@Bind("namespace") String lastReadNamespace);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be contentious, but we should test that the query plan has the word search and not scan

(or even match it directly)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: getNext_Lexicographically_SmallestNamespace(...)

Also, making assertions on the query plan is probably a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So getting the query plan in code is a little involved. Just adding a comment instead of a test if that's good enough?

Copy link
Contributor

@jeremyk-91 jeremyk-91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, looks good. I actually prefer just having one query, and the query plan based tests would be a good idea

Comment on lines 144 to 145
@SqlQuery("SELECT MIN(namespace) FROM paxosLog WHERE namespace > :namespace")
Optional<String> getNextSmallestNamespace(@Bind("namespace") String lastReadNamespace);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: getNext_Lexicographically_SmallestNamespace(...)

Also, making assertions on the query plan is probably a good idea.

Comment on lines 141 to 142
@SqlQuery("SELECT MIN(namespace) FROM paxosLog")
Optional<String> getSmallestNamespace();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually prefer using just one query as that's consistent with other patterns we have in Atlas - the thing you describe is pretty standard!

Comment on lines 43 to 52
Set<Client> clients = new HashSet<>();

Optional<String> currentNamespace = getSmallestNamespace();
while (currentNamespace.isPresent()) {
String namespaceString = currentNamespace.get();
clients.add(Client.of(namespaceString));
currentNamespace = getNextSmallestNamespace(namespaceString);
}

return clients;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if Stream.generate(...) is simpler than what you have! Agree with your approach here

Copy link
Contributor

@jeremyk-91 jeremyk-91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good. Thanks!

@bulldozer-bot bulldozer-bot bot merged commit 1ef52ee into develop Apr 3, 2023
@bulldozer-bot bulldozer-bot bot deleted the aagan branch April 3, 2023 14:16
@svc-autorelease
Copy link
Collaborator

Released 0.828.0

ergo14 added a commit that referenced this pull request Apr 5, 2023
ergo14 added a commit that referenced this pull request Apr 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants