Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define and architect SQL schema and queries for querying relations #300

Closed
aeneasr opened this issue Nov 10, 2020 · 19 comments
Closed

Define and architect SQL schema and queries for querying relations #300

aeneasr opened this issue Nov 10, 2020 · 19 comments
Assignees
Labels
blocking Blocks milestones or other issues or pulls. corp/m5 Up for M5 at Ory Corp. feat New feature or request.

Comments

@aeneasr
Copy link
Member

aeneasr commented Nov 10, 2020

Is your feature request related to a problem? Please describe.

We need to define all SQL queries required for fetching / querying relations.

Describe the solution you'd like

Discuss SQL queries in this issue.

@aeneasr aeneasr added feat New feature or request. blocking Blocks milestones or other issues or pulls. corp/m4 Up for M4 at Ory Corp. labels Nov 10, 2020
@aeneasr aeneasr added this to the Next Gen Keto milestone Nov 10, 2020
@zepatrik
Copy link
Member

zepatrik commented Nov 10, 2020

From the paper:

Relation Tuple Storage
We store relation tuples of each namespace in a separate
database, where each row is identified by primary key (shard
ID, object ID, relation, user, commit timestamp). Multiple tuple versions are stored on different rows, so that we
can evaluate checks and reads at any timestamp within the
garbage collection window. The ordering of primary keys
allows us to look up all relation tuples for a given object ID
or (object ID, relation) pair.
Our clients configure sharding of a namespace according
to its data pattern. Usually the shard ID is determined solely
by the object ID. In some cases, for example, when a namespace stores groups with very large numbers of members, the
shard ID is computed from both object ID and user.

@zepatrik
Copy link
Member

Approach using tables per namespace

Dynamic table migration:

CREATE TABLE namespace_name_relation_tuples (
    shard_id    varchar(64),
    object_id   varchar(64),
    relation    varchar(64),
    subject     varchar(128), /* can be object_id#relation or user_id */
    commit_time timestamp,
    PRIMARY KEY (shard_id, object_id, relation, subject, commit_time)
);

CREATE INDEX object_id_idx ON namespace_name_relation_tuples (object_id);

CREATE INDEX user_set_idx ON namespace_name_relation_tuples (object_id, relation);

Namespace config table:

CREATE TABLE keto_namespaces (
    name varchar(64) PRIMARY KEY
    /* rest of the config */
)

This means we have to do requests similar to:

c.RawQuery(
	fmt.Sprintf("INSERT INTO %s (%s) VALUES (%s)",
		sqlSafeTableFromNamespace(r.Namespace),
		strings.Join(rows, ", "),
		placeholders),
	values...).Exec()
c.RawQuery(
	fmt.Sprintf("SELECT * FROM %s LIMIT %d OFFSET %d",
		sqlSafeTableFromNamespace(query.Namespace),
		pagination.PerPage,
		pagination.Page*pagination.PerPage,
	),
	args...).Exec()

@aeneasr
Copy link
Member Author

aeneasr commented Nov 11, 2020

Avoiding SQL injections would be possible with pattern: [a-z_]{6,} (upper limit depends on database table name limitations).

Generally I think we could configure namespaces (and thus clients?) using the config file for now. This would remove a lot of pressure to implement migrations and similar things and would allow to still implement this later on without breaking changes (we could read both from disk and db for config for a while).

We could write a custom pop migrator that is able to run migration files multiple times if a specific pattern is applied (e.g. relationtuple in the name).

A big question mark from my side is if we really need the composite primary key. Another question is if we need the shard_id and if we do, what it actually does?

@aeneasr
Copy link
Member Author

aeneasr commented Nov 11, 2020

Ah, I believe the shard_id is a UUID which satisfies a common concern for key/value distributed databases - here taken from CRDB docs:

Define primary key constraints over multiple columns (i.e., use composite primary keys).

When defining composite primary keys, make sure the data in the first column of the primary key prefix is well-distributed across the nodes in the cluster. To improve the performance of ordered queries, you can add monotonically increasing primary key columns after the first column of the primary key prefix. For an example, see Use multi-column primary keys on the SQL Performance Best Practices page.

@zepatrik
Copy link
Member

The paper says in the paragraph I posted above:

Our clients configure sharding of a namespace according
to its data pattern. Usually the shard ID is determined solely
by the object ID. In some cases, for example, when a namespace stores groups with very large numbers of members, the
shard ID is computed from both object ID and user.

@aeneasr
Copy link
Member Author

aeneasr commented Nov 11, 2020

Regarding primary key performance, this is an interesting read, illustrating the problem explained in #300 (comment)

@robinbraemer
Copy link
Contributor

Right, we need the shard id as first element in the primary key to spread data and reduce hotspots. This shard id should not (!) be a large UUID, but be configured by the tuple's namespace config depending on how large the number of users in a UserSet is to beexpected (as stated in the paper). :)

Checkout this for more details: https://www.cockroachlabs.com/blog/how-to-choose-db-index-keys/ where we will opt for the "Hybrid Approach".

If you need more write throughput than timestamp IDs offer but more clustering than random UUIDs, you can use sharded keys to spread the load out across the cluster and reduce hotspots.

@robinbraemer
Copy link
Contributor

robinbraemer commented Nov 11, 2020

About storing the namespace config: Since the ACL system becomes a tier 1 public and internal service and other services depending and building on top of it to check authorization of their API requests and likely face users, very low latency must be a high goal and so the Zookie concept should be looked at very early as it plays key roles in how data gets stored and retrieved.

Relation tuples get written with a commit timestamp as well as namespace configs. All namespace configs should always be loaded in cache on all ACL servers and updates be exchanged between the servers.

So the paper in 4.1 classifies read calls to the API into two categories: Safe & Recent
where Recent is a read of up-to-date data only that very likely involves one or more cross-region/zone and probably a submarine network hop depending on the multi-region cockroachdb deployment.

Clients should always try to go for requests of class Safe for low latency.

In order to maintain low latency, one would replicate the data to nodes near the user and use follower reads.

This means API clients specify a Zookie in the request that specifies how old the data at max. the client is okey to deal with.

Internally the ACL server uses the
stored relation tuples and namespace config committed at least as new as the Zookie timestamp and also for the expand API passes this Zookie down to other ACL servers. This combined with a groupcache to reduce db lookups, increase throughput and coordinate central cache loading and spreading to prevent hotspot when one tuple is very popular.

Please checkout https://www.cockroachlabs.com/docs/stable/follower-reads.html

Understanding Zookie and follower reads, I do want to point out that cockroachdb now has a follower_read_timestamp()to be used with AS OF SYSTEM TIME that computes the least required timestamp for follower reads.

Having this function, we could consider replacing the Zookie client request parameter with an optional follower_read bool that takes care of low latency and best effort consistent response.

@zepatrik
Copy link
Member

I think what is also important to consider is that we should be aware of zanzibar being designed for "Google scale", while not many users of keto will run on the same scale. There are definitely some ideas in the paper that are due of this "Google scale" goal and we might be able to solve simpler. This of course depends on the goal of the project.
We could even decide to go with "100.000 user" scale first and add some features like APIs to create/configure namespaces later on. In a startup scale deployment you would configure namespaces every month at most. I think it would be fine to have them in files/websocket/... and hot reload them in every instance. This would reduce the consistency requirements as we only have to deal with commit times for the relation tuples.
Also another question is if we want to focus on CRDB only and use their proprietary features. Many keto users will not run it globally distributed anyway (but Ory will, so it has to be possible!).

@robinbraemer
Copy link
Contributor

robinbraemer commented Nov 12, 2020

We could even decide to go with "100.000 user" scale first and add some features like APIs to create/configure namespaces later on.

Sounds good! Gradually adding features (e.g. adding namespace config API later) to keep the ramp up straightforward and designing the system upfront in a way that allows us to improve to greater scale later (e.g. > 1M users).

Also another question is if we want to focus on CRDB only and use their proprietary features.
Yes, as you have said. I wouldn't want to restrict Keto users to CRDB and even enterprise features either.

I think we should make it configurable when starting the Keto instance to toggle whether the DB backend is CRDB and another setting, whether CRDB is in enterprise mode, so we can optionally leverage and squeeze the most out of CRDB for bigger deployments.

@zepatrik
Copy link
Member

Right. I don't know if you familiarized yourself with our go project structure, but we have this configuration provider, registry and persister that are all defined via interfaces. It would easily be possible to have a CRDB specific persister that gets loaded according to the config.

I fully agree that we should keep Google scale in mind when designing everything so it is prepared upfront.

@zepatrik
Copy link
Member

OK, I played around with a more normalized form of the database schema that also avoids dynamic table names:

CREATE TABLE namespace
(
    name VARCHAR(64) PRIMARY KEY
);

CREATE TABLE relation
(
    name      VARCHAR(64),
    namespace VARCHAR(64) FOREIGN KEY REFERENCES namespace,

    PRIMARY KEY (name, namespace)
);

CREATE TABLE subject
(
    raw               VARCHAR(128),
    namespace         VARCHAR(64),
    user_id           VARCHAR(64),
    userset_object_id VARCHAR(64),
    userset_relation  VARCHAR(64),

    FOREIGN KEY (userset_relation, namespace) REFERENCES relation(name, namespace),
    PRIMARY KEY (raw, namespace)
);

CREATE TABLE relation_tuple
(
    object_id   VARCHAR(64),
    relation    VARCHAR(64),
    subject     VARCHAR(128),
    namespace   VARCHAR(64) FOREIGN KEY REFERENCES namespace,
    commit_time TIMESTAMP,

    FOREIGN KEY (relation, namespace) REFERENCES relation(name, namespace),
    FOREIGN KEY (subject, namespace) REFERENCES subject(raw, namespace),
    PRIMARY KEY (object_id, relation, subject, namespace, commit_time)
);

Seeing this, and also with the knowledge that all requests will only ever regard data in one namespace, I think that it is not the worst idea to have dynamic names.

@robinbraemer
Copy link
Contributor

robinbraemer commented Nov 12, 2020

#300 (comment)
Generally I think we could configure namespaces (and thus clients?) using the config file for now. This would remove a lot of pressure to implement migrations and similar things and would allow to still implement this later on without breaking changes (we could read both from disk and db for config for a while).

Conflicts, inconsistencies and precedence of file vs db based namespace configs

ref: #303

I have to appeal, if we opt for file based namespace configs only at the beginning:

What happens when there are inconsistent namespace configs across Keto instances in the cluster?
This will definitely happen if we do not have a central store for the namespace configs!

Rolling updates (manually by user or automated by k8s) may take a long time in which case causes Keto instances to have inconsistent namespace configs.
- This may be fatal when looking at the expand api that spreads checks of deep tuples to multiple Keto instances.
- Every Keto instance in the cluster should have the same ns configs.

Additionally, if we have file based namespace configs and we later move over from file namespace config --> db namespace config:

  • How will Keto know which namespace config should be used in the case the same namespace is found in file and in db?
  • Will Keto override the db using the file? Still, the "inconsistent problem".

I'm not quite sure if going for file based namespace configs would remove our pressure or make it even simpler for the user and system in such a distributed system overall.

I thing directly going for the db stored namespace config is better, while delivering a convenient cli tool to manage namespace configs and requiring a schema version on every config.

Versioned schema for namespace config

I suggest that, right from the beginning, every namespace config must specify a schema version (e.g. version: v1alpha1) to allow us to make changes in the future compatible.

Storing namespace config

When storing the namespace config in the database, we can use the flexible JSONB datatype:.

CREATE TABLE namespace
(
    name VARCHAR(64),
    config JSONB,
    commit_time TIMESTAMP,
    PRIMARY KEY(name, commit_time)
);

Admin API & Client tool

I would like to introduce an admin API where we have RPCs for creating/deleting/writing/reading namespaces and more in the future. This will separate normal ACL clients from critical admin tasks.

I would also like to add a Keto command line tool (as keto acl ... subcommand) to manage namespace configs and where admins can apply their namespace config (in kubectl apply -f ns.yaml merge/patch style).

@aeneasr
Copy link
Member Author

aeneasr commented Nov 13, 2020

I have to appeal, if we opt for file based namespace configs only at the beginning:

Distributed databases are also eventually consistent, which means that the problems apply there as well. @zepatrik told me that the config is also timestamped which would allow to handle these situations. One thing you mentioned however we can not prevent:

What happens when there are inconsistent namespace configs across Keto instances in the cluster?
This will definitely happen if we do not have a central store for the namespace configs!

In general, we have one worker/task/instance/job responsible for changing the SQL schema (e.g. adding new tables) so that is the source of truth - let's call it the "schema node". If "read/write" nodes (regular ORY Keto operation) is misconfigured, it would simply die as a node or return errors or outdated timestamps for AC requests/questions. We can do this by using checksums and timestamps.

GitOps is becoming more and more popular and I think it will be the future of globally scaled systems. Config via databases is just too difficult to test and implement in code review. GitOps makes this really easy as e.g. ArgoCD is responsible for distributing the "source of truth" configuration.

Versioned schema for namespace config

Absolutely! We are still trying to wrap our heads around the best way to run migrations for these tables. There are some ideas but nothing truly convincing yet.

Admin API & Client tool

Totally! There should also be a difference between "write relations" and "check if something is allowed" as the former is something very privileged while the latter is something many should be able to do.

@robinbraemer
Copy link
Contributor

robinbraemer commented Nov 13, 2020

In general, we have one worker/task/instance/job responsible for changing the SQL schema (e.g. adding new tables) so that is the source of truth - let's call it the "schema node".

I guess we than have something like leader election to elect a schema node in the cluster? (similar to k8s controller election)
Then all nodes will only ever send and take namespace config updates to/from the leader?

GitOps is becoming more and more popular and I think it will be the future of globally scaled systems. Config via databases is just too difficult to test and implement in code review. GitOps makes this really easy

Yes, this was also something I was concerned about when using the "namespace config stored in db" approach. Better supporting namespace changes through GitOps with namespace configs as files is one major pro for the files approach over db!

@aeneasr
Copy link
Member Author

aeneasr commented Nov 17, 2020

I guess we than have something like leader election to elect a schema node in the cluster? (similar to k8s controller election)
Then all nodes will only ever send and take namespace config updates to/from the leader?

No, you just specify an additional k8s job with different CLI flags. No need for consensus!

@robinbraemer
Copy link
Contributor

@zepatrik do you have an sql schema yet? :)

@zepatrik
Copy link
Member

There is some draft on the sql-persister branch, but as especially namespaces were a big blocker I finished that first and will update the sql branch today.

@zepatrik zepatrik added the corp/m5 Up for M5 at Ory Corp. label Jan 13, 2021
@zepatrik zepatrik removed the corp/m4 Up for M4 at Ory Corp. label Jan 13, 2021
@zepatrik
Copy link
Member

Closing as this is done. Improvements (if required) will be added with #301

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocking Blocks milestones or other issues or pulls. corp/m5 Up for M5 at Ory Corp. feat New feature or request.
Projects
None yet
Development

No branches or pull requests

3 participants