-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DPLT-1049 Provision separate DB per user #144
Conversation
This reverts commit 858eea3.
@@ -7,12 +7,17 @@ provider: | |||
name: aws | |||
runtime: nodejs16.x | |||
region: eu-central-1 | |||
timeout: 15 | |||
timeout: 120 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting a large timeout to ensure we don't timeout, I'll reduce this once we have a clearer idea of how long provisioning is taking.
startFromBlock-runner: | ||
type: queue | ||
fifo: true | ||
maxRetries: 1 | ||
# batchSize: 100 | ||
worker: | ||
handler: handler.consumer | ||
timeout: 15 # 1.5 minutes as lift multiplies this value by 6 (https://github.com/getlift/lift/blob/master/docs/queue.md#retry-delay) | ||
timeout: 120 # 12 minutes as lift multiplies this value by 6 (https://github.com/getlift/lift/blob/master/docs/queue.md#retry-delay) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Visibility timeout of 12 minutes doesn't really apply anymore since we aren't retrying
Ok, so the |
@gabehamilton yep |
Revert of #143, which therefore contains the changes from #132 #139 and #140. The original implementation (#132) had a few problems, which I'll address in the PR.
This PR updates the Lambda Runner provisioning step to create a separate Postgres database per user. This provides greater security, preventing users from manipulating other users data within the provided SQL.
Provisioning High-Level
This section outlines the high level steps during provisioning, as well as some notes on the changes made to them.
1. Create database/user in Postgres
It's possible to run SQL against Postgres via Hasura, but the provided statements are run within a transaction block. As
CREATE DATABASE
can not be run within a transaction, we need a direct connection to Postgres to create new databases.Crypto random passwords are generated for each PG user, these will be stored by Hasura in plain text within its metadata table. Exposing the password/connection string via an environment variable isn't really feasible as this would require: creating a secret, updating the Cloud Run revision, and restarting the instance. We can look at more secure options for this in future.
2. Add database to Hasura
Database connections are added as 'data sources' to Hasura. To maintain the same query/mutation shape as we have currently, i.e.
{account_id}_{function_name}_{table_name}
,root_field
andtype_name
prefixes will be configured on the data source. The requirement for these is discussed below.3. Untrack existing tables
As the new DB is a direct copy of the existing schema, 'tracking' the tables will result in conflicts within GraphQL. To prevent this, the existing tables must be 'untracked' in Hasura, removing them from the GraphQL schema.
3. Run user provided SQL
A user could theoretically create two different functions with the same tables, resulting in conflicts within Postgres. To avoid this, a new schema is created per function. Since this is scoped within the accounts DB, we can name this after just the function, rather than a concatenation of both account/function like we have currently.
Hasura surfaces tables in GraphQL via
{schema_name}_{table_name}
. With the shared DB we have currently, theschema_name
is set to{account_id}_{function_name}
. As mentioned above, this is now only set to{function_name}
, which could potentially lead to conflicts if another account forks the same indexer. To prevent this, the data source is configured to prefix all root fields and types with{account_id}
4. Track tables, foreign key relationships, and add permissions
These steps are mostly unchanged, but have been slightly refactored to take in to account execution against different databases/sources.
Migration
All accounts/functions will be migrated to a new DB after Coordinator restart, this happens for the following reasons:
false
for all indexers, which therefore triggers the updated provisioning step.The migration which happens within the Runner will not migrate the existing data, the user will need to re-index to re-populate their database.
A script has been created to migrate users data if required. This will be done on an ad-hoc basis and requires some manual intervention. To migrate data, the following will need be completed:
./scripts/migrate-schema-to-db.js
node ./scripts/migrate-schema-to-db.js dataplatform_near social_feed
- this moves the schema to it's own DB, along with its data, and configures it within HasuraThese steps are essentially the same as the automatic provisioning within
Provisioner
, with the addition of data migration. Automatic provisioning will be skipped within the Runner as the DB/Schema already exist.Other Considerations
By default, all users are able to connect to the
template
databases, which are used as 'templates' for new databases. To prevent users from modifying these templates we should removePUBLIC
access from them: