Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add mapping table for encoding subject into UUIDs #809

Merged
merged 8 commits into from
Feb 17, 2022

Conversation

hperl
Copy link
Collaborator

@hperl hperl commented Jan 9, 2022

This PR contains first work towards supporting automatic subject/object encodings into UUIDs, as discussed in #792.

Related issue(s)

#792

Checklist

  • I have read the contributing guidelines.
  • I have referenced an issue containing the design document if my change
    introduces a new feature.
  • I am following the
    contributing code guidelines.
  • I have read the security policy.
  • I confirm that this pull request does not address a security
    vulnerability. If this pull request addresses a security. vulnerability, I
    confirm that I got green light (please contact
    [email protected]) from the maintainers to push
    the changes.
  • I have added tests that prove my fix is effective or that my feature
    works.
  • I have added or changed the documentation.

Future work

In order to complete the work for #792, the following items still need to be adressed (in this or future PRs):

@hperl hperl requested a review from zepatrik as a code owner January 9, 2022 10:49
@hperl hperl changed the title Add mapping table for encoding subject into UUIDs feat: Add mapping table for encoding subject into UUIDs Jan 9, 2022
@hperl
Copy link
Collaborator Author

hperl commented Jan 9, 2022

@zepatrik the Cockroach migration test failed with std_err: Could not apply migrations: ERROR: relation "keto_uuid_mappings" already exists (SQLSTATE 42P07), although the table is only created once in the migration scripts. Did I misunderstand how you do migrations?

https://app.circleci.com/pipelines/github/ory/keto/1737/workflows/da82ec86-7471-4ff4-ab64-77d32d1a91d3/jobs/11054?invite=true#step-110-785

Copy link
Member

@zepatrik zepatrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a good starting point but I think some requirements were not clearly defined in the issue. Now that I see the first implementation iteration, I clarified some parts 😉

internal/driver/config/config.schema.json Outdated Show resolved Hide resolved
internal/persistence/sql/uuid_mapping_test.go Show resolved Hide resolved
internal/persistence/sql/uuid_mapping.go Outdated Show resolved Hide resolved
@@ -0,0 +1,10 @@
CREATE TABLE keto_uuid_mappings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use https://github.com/gobuffalo/fizz as a DDL tool to create migrations. See other example create table definitions: https://github.com/ory/kratos/blob/c455fbf2f4792cb3fd2f846a5e699e837d8c98a4/persistence/sql/migrations/templates/20191100000001_identities.up.fizz
You can then run make migrations-render-replace to render the fizz template to SQL files.

Further, we will need to migrate the existing relation tuple table as well. In case the ID in there already is a UUID (maybe using a regex to find those?) nothing has to be done. In case it is not, we will have to add an entry to this mapping table. That migration process needs good tests to ensure everything works. Migration tests are here: https://github.com/ory/keto/blob/fabf1a06d90139f6678fc7bc1703ee92a5fc9d69/internal/persistence/sql/migrations/migratest/migration_test.go

internal/namespace/definitions.go Outdated Show resolved Hide resolved
internal/persistence/sql/uuid_mapping.go Outdated Show resolved Hide resolved
@hperl hperl force-pushed the auto-subject-object-encoding branch from 8966431 to 17c8489 Compare January 10, 2022 21:07
@hperl
Copy link
Collaborator Author

hperl commented Jan 10, 2022

Thanks for the review. I addressed most of the points (I think), but I am still not sure where the best place is to actually do the migration as well as add the business logic that inserts a UUID in the mapping table if a non-UUID subject ID should be inserted.

@hperl hperl force-pushed the auto-subject-object-encoding branch 2 times, most recently from c3ad030 to f519a5c Compare January 12, 2022 20:05
@aeneasr
Copy link
Member

aeneasr commented Jan 14, 2022

@zepatrik friendly ping :)

Copy link
Member

@zepatrik zepatrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hperl thanks for addressing the feedback 👍 and sorry for the late reply 🙈

I have moved the mapping manager test to the right location, I think it is easier to show by commit than to explain 😉
This is now pretty much what I expected as the first draft.
For the migration, we would ideally implement that in SQL. If it gets too complicated there (especially for cross-DB compatibility), we can also do it in go. There is already one migration process, the SingleTableMigrator that runs in go. You can give it a try in SQL if you want 😉

In all the handlers/... we would then change the subject and object fields to be one of subject_uuid or subject_string. In the latter case, we would apply the mapping. But I would do that in a separate follow up PR if you don't want to invest all that work.

Oh and btw, for that we will also need a string to UUID function, and therefore also an (unique) index on the string column. It would be great if you can add that in this PR.

@hperl
Copy link
Collaborator Author

hperl commented Jan 17, 2022

@zepatrik, thanks for moving stuff to the correct place. It feels a little strange to have the test function defined outside of a _test file, though.

As for the SQL migration, that would still go into 20220110200400_create-uuid-mapping-table.up.sql, correct?

(btw, I quickly added the index in the migration)

hperl and others added 2 commits January 17, 2022 20:36
A new table was added that stores a mapping between an arbitraty string
and a UUIDv5 of that string, namespaced by a namespace UUID that the
user can set in the namespace configuration.
@hperl hperl force-pushed the auto-subject-object-encoding branch from e915f87 to bd9b2e0 Compare January 17, 2022 19:37
@zepatrik
Copy link
Member

It feels a little strange to have the test function defined outside of a _test file, though.

We use this pattern for the persister tests only. Main reason being that we can avoid circular dependencies with that. Also, we don't usually spin up all databases for all tests, but only have this one persister test that uses ALL supported databases.

And yes, you can try to add the migration in that file 😉

@hperl
Copy link
Collaborator Author

hperl commented Jan 19, 2022

Having thought about writing the migration in SQL, I think Go is the better choice. The migration would need to parse a UUID that conforms to the Go implementation (https://pkg.go.dev/github.com/gofrs/uuid?utm_source=gopls#UUID.UnmarshalText) as well as generate UUIDs (and at least MySQL generates v1 UUIDs only, which are not random, see https://dev.mysql.com/doc/refman/8.0/en/miscellaneous-functions.html#function_uuid).

With both generation and parsing being hard to do consistently across SQL dialects, I'd go with Go. @zepatrik, WDYT?

@hperl hperl force-pushed the auto-subject-object-encoding branch 5 times, most recently from 6d3254d to 78d153e Compare January 20, 2022 19:26
@rdurell
Copy link

rdurell commented Jan 20, 2022

If mapping subject to UUID, why not increase the supported subject length as well? We're looking at using somewhat complex subject strings which would benefit by being larger then 64 characters.

@hperl
Copy link
Collaborator Author

hperl commented Jan 20, 2022

@rdurell sure, I can increase it to 1024. Do you have more information on typical subject ID sizes?

@hperl hperl force-pushed the auto-subject-object-encoding branch from c66f85b to 2af3d91 Compare January 20, 2022 20:30
@hperl hperl force-pushed the auto-subject-object-encoding branch from 2af3d91 to fb7af04 Compare January 21, 2022 08:12
@zepatrik
Copy link
Member

With both generation and parsing being hard to do consistently across SQL dialects, I'd go with Go.

Sounds reasonable 👍

why not increase the supported subject length as well?

Definitely, maybe we can even make the strings unbound? I know that at least postgres and cockroach don't really care. If you happen to store big strings, you will face the performance implications of that. But we should not force you to do that.

Copy link
Member

@zepatrik zepatrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, just some remarks on the migrator.

internal/persistence/sql/migrations/uuid_mapping.go Outdated Show resolved Hide resolved
internal/persistence/sql/migrations/uuid_mapping.go Outdated Show resolved Hide resolved
SubjectSetRelation: dbsql.NullString{String: "rel", Valid: true},
CommitTime: time.Now(),
},
expectMapping: false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
expectMapping: false,
expectMapping: true,

@hperl
Copy link
Collaborator Author

hperl commented Jan 23, 2022

Definitely, maybe we can even make the strings unbound? I know that at least postgres and cockroach don't really care. If you happen to store big strings, you will face the performance implications of that. But we should not force you to do that.

We can either use a VARCHAR(x) with a large x, or even TEXT, but in both cases MySQL will limit us, because indices are limited in size (and we put an index on both the UUID and the string representation). So even for TEXT the index would then need to take a prefix length, see https://dev.mysql.com/doc/refman/8.0/en/create-index.html.

For storing arbitrary large strings, I would not want to put an index on them or even lookup by the string representation then, but either:

  • add another string column that holds a hash of the string representation, put an index on that and hash the strings before lookup
  • Generate the UUIDs with UUIDv5 directly from the string representation and a constant or empty namespace (<-- preferred)

Maybe we can quickly go over the last open points of the design in-band, I think we could quickly resolve them.

@hperl hperl force-pushed the auto-subject-object-encoding branch from afb5c7d to 7438c28 Compare January 23, 2022 12:15
@hperl hperl requested a review from zepatrik January 23, 2022 12:32
@hperl hperl force-pushed the auto-subject-object-encoding branch 2 times, most recently from 4dcdfda to 7438c28 Compare February 3, 2022 16:34
@hperl hperl requested a review from zepatrik February 8, 2022 09:18
Copy link
Member

@zepatrik zepatrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really sorry for the long review time. This looks very good now.
IMO we could merge it as is and open new PRs to finish the feature? This would block us however on creating releases...
Alternatively, we can create a feature branch that we then open more PRs against and in the end rebase master on that.

assert.Equal(t, tc.rt, newRt)
}
})
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to add a test case here that ensures the pagination works. There is a wrapper for the relation tuple manager where you can overwrite the default pagination:

type ManagerWrapper struct {
Reg ManagerProvider
PageOpts []x.PaginationOptionSetter
RequestedPages []string
}
var (
_ Manager = (*ManagerWrapper)(nil)
_ ManagerProvider = (*ManagerWrapper)(nil)
)
func NewManagerWrapper(_ *testing.T, reg ManagerProvider, options ...x.PaginationOptionSetter) *ManagerWrapper {
return &ManagerWrapper{
Reg: reg,
PageOpts: options,
}
}

You can initialize and use it in the test similar to
reg := driver.NewSqliteTestRegistry(t, false)
require.NoError(t, reg.Config().Set(config.KeyNamespaces, namespaces))
mr := relationtuple.NewManagerWrapper(t, reg, pageOpts...)
return &deps{
ManagerWrapper: mr,
configProvider: reg,
loggerProvider: reg,
}

t.Run("case=paginates", func(t *testing.T) {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I added a test for pagination. I did not use the ManagerWrapper, because I was not retrieving the tuples via the manager anyways, but I used that code as inspiration nonetheless 😉.

string_representation TEXT NOT NULL CHECK (string_representation <> ''),

PRIMARY KEY (id)
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that we of course have to change the column types in the relation tuple table to UUID as well. I am afraid that we have to create a new table and insert into that instead of the old one. Or do you have other ideas?
The migrator we use can theoretically be extended to add go migrations btw, I think that would definitely make sense. Then we also don't need the command 🤔
What do you think about that @aeneasr?

https://github.com/ory/x/blob/fdc336251a395d29c7a242b611dc3fe3740620f8/popx/migration_box.go

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a new table might not be necessary. Assuming that object and subject_id already hold UUIDs, we can, in Postgres, do:

ALTER TABLE keto_relation_tuples ALTER object TYPE UUID using object::uuid;

(and similar for subject_id). MySQL does not have a UUID type, so we will stick with VARCHAR(64). The same statement works in Cockroachdb when using SET enable_experimental_alter_column_type_general = true.

@hperl
Copy link
Collaborator Author

hperl commented Feb 16, 2022

I am really sorry for the long review time. This looks very good now. IMO we could merge it as is and open new PRs to finish the feature? This would block us however on creating releases... Alternatively, we can create a feature branch that we then open more PRs against and in the end rebase master on that.

I'm not a big fan of long-living feature branches, but I don't know the release process well enough. If we could prevent the migration from running, nothing user-visible changed. Is that an option?

@zepatrik
Copy link
Member

I'm not a big fan of long-living feature branches, but I don't know the release process well enough.

Agree, but we can only release master as is easily, so this would block us. You are right that currently it does not affect the previous code, but it will once we start integrating the mapping into handlers, ...
Just keeping this off of the main branch will make it a bit easier.

@zepatrik zepatrik changed the base branch from master to feat/uuid-mapping February 17, 2022 09:26
@zepatrik zepatrik merged commit 79cfd6e into ory:feat/uuid-mapping Feb 17, 2022
@hperl hperl deleted the auto-subject-object-encoding branch February 17, 2022 13:31
hperl added a commit that referenced this pull request May 9, 2022
hperl added a commit that referenced this pull request May 16, 2022
zepatrik added a commit that referenced this pull request Jul 27, 2022
With this change Keto now maps strings to UUIDv5 on the storage layer. This change allows unlimited strings to be used while maintaining good performance. Further, it reduces the likeliness of database hot-spots.
The migration that applies this mapping might take some time, so please confirm that your migration strategy works for you.

BREAKING CHANGE: `keto namespace migrate ...` commands were removed. To migrate from v0.6.0-alpha.1, please first migrate the legacy namespaces using v0.8.0-alpha.2
BREAKING CHANGE: The protobuf API was bumped to `v1alpha2`. Please upgrade your client dependency to that version. `v1alpha1` is still supported for now, but might be dropped soon.
BREAKING CHANGE: Some payload keys are now (not) required anymore. The generated SDKs will likely have breaking changes.

Co-authored-by: Patrik <[email protected]>
Co-authored-by: hperl <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants