Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multitenant datasources #11861

Open
gavinking opened this issue Sep 3, 2020 · 19 comments
Open

Multitenant datasources #11861

gavinking opened this issue Sep 3, 2020 · 19 comments
Labels

Comments

@gavinking
Copy link

gavinking commented Sep 3, 2020

Currently the Hibernate extension features some integration with Hibernate multitenancy. Multitenancy in Hibernate essentially boils down to letting the program set a tenant id in a certain context, and having the datasource used by Hibernate selected automatically on the basis of that id. Where "automatically" means you the user can write code to select it.

It seems to me that this concern would be better handled in Quarkus at the level of the Quarkus datasource itself. There's no particular advantage to doing this in Hibernate, other than before this we had control over Hibernate but not over the environment in which Hibernate operated. And there are disadvantages, primarily that non-Hibernate code can't take advantage of the feature.

So I propose that we provide a way to map a contextual tenant id to a schema name, catalog name, or database URI, as a feature of the Quarkus datasource, and remove this feature from the Hibernate extension.

Thoughts?

@sberyozkin
Copy link
Member

sberyozkin commented Sep 4, 2020

@gavinking Would it make sense to link OIDC Multi-tenancy with Hibrernate Multi-tenancy closer ?
There was a blog post awhile back.

OIDC extension runs first, and when its mult-tenancy is activated, a tenant id is available, can be saved in RoutingContext if needed.
Then Hibernate extension would pick it up and use it to choose its own tenant configuration.

What I'm not sure about is if it makes sense to use the same tenant id to 1) select an OIDC configuration (how to verify the token etc) and then 2) to use the same id to to select the configuration to read the data. Perhaps these are conceptually different activities.

@DorianMaliszewski, can you comment please ?

@DorianMaliszewski
Copy link

Hi,

Maybe this feature can be implemented by adding a filter after OIDC tenant resolver and in application.properties just quarkus.hibernate.use-oidc-tenant-resolver=true or something close.

It can be a good thing to implement an easy way to link hibernate to oidc tenant resolver. 👍

@gavinking
Copy link
Author

@sberyozkin yes, sure, it sounds like it would be great to integrate with that stuff as a source of tenant ids, though my main concern here in this particular issue is that I want it to be the datasource layer, not Hibernate itself, which is responsible for interpreting the tenant ids.

But yes, it sounds like a good idea.

@sberyozkin
Copy link
Member

Hi @DorianMaliszewski thanks, yes, I was thinking of getting this ID automatically, without even having to do something, OIDC would set the ID in the the request context and it will be picked up if other Hibernate or lower level resolver is not available.

@sberyozkin
Copy link
Member

Hi @gavinking thanks; sorry, I think I'd create a dedicated issue to discuss linking OIDC and Hibernate/datasource tenants, so as to not to overload this issue :-)

@gavinking
Copy link
Author

Hi @gavinking thanks; sorry, I think I'd create a dedicated issue to discuss linking OIDC and Hibernate/datasource tenants, so as to not to overload this issue :-)

Don't apologize, it's a great idea; please open the issue!

@michael-schnell
Copy link
Contributor

It seems to me that this concern would be better handled in Quarkus at the level of the Quarkus datasource itself. There's no particular advantage to doing this in Hibernate, other than before this we had control over Hibernate but not over the environment in which Hibernate operated. And there are disadvantages, primarily that non-Hibernate code can't take advantage of the feature.

At the time the Multitenancy feature was implemented, Hibernate ORM extension was the only place to integrate this in a short amount of time. Unfortunately Hibernate Multitenancy itself isn't really finalized since years. As stated in the manual "The JPA expert group is in the process of defining multitenancy support for an upcoming version of the specification". So this was a best effort approach to rely on what Hibernate offered, even if it is not much.

In general I agree, that the current solution is mainly a way to select the right datasource that is used for the current tenant.

@michael-schnell
Copy link
Contributor

Hi,

Maybe this feature can be implemented by adding a filter after OIDC tenant resolver and in application.properties just quarkus.hibernate.use-oidc-tenant-resolver=true or something close.

It can be a good thing to implement an easy way to link hibernate to oidc tenant resolver. 👍

In general the mechanism for selecting the tenant is pretty much the same in the Hibernate ORM Multitenancy and the ODIC extension. It is very easy to integrate both parts using CDI.

@gavinking
Copy link
Author

gavinking commented Sep 6, 2020

So I've done a bit more research into how this stuff has been done in Quarkus, and here's a more concrete proposal to kick about.

  1. Introduce a parent/child relationship between "base" datasources and "tenant" datasources.
  2. Tenant datasources inherit the configuration of the base datasource, except for any setting they explicitly override.
  3. The base datasource uses the current tenant identifier (if it is set) to forward connection requests to the corresponding tenant datasource.
  4. Remove the Hibernate-specific code.

Now, configuring a multitenant datasource in Quarkus would look like this:

# Default tenant 'base'
quarkus.datasource.base.db-kind=postgresql
quarkus.datasource.base.username=quarkus_test
quarkus.datasource.base.password=quarkus_test
quarkus.datasource.base.jdbc.url=jdbc:postgresql://localhost:5432/quarkus_test

# Tenant 'mycompany'
quarkus.datasource.mycompany.inherit=base
quarkus.datasource.mycompany.jdbc.url=jdbc:postgresql://localhost:5433/mycompany

Or:

# Default tenant 'base'
quarkus.datasource.base.db-kind=postgresql
quarkus.datasource.base.username=quarkus_test
quarkus.datasource.base.password=quarkus_test
quarkus.datasource.base.jdbc.url=jdbc:postgresql://localhost:5432/quarkus_test

# Tenant 'mycompany'
quarkus.datasource.mycompany.inherit=base
quarkus.datasource.mycompany.username=mycompany
quarkus.datasource.mycompany.password=secret

The advantages of this approach include:

  • Multitenancy is available as a feature to all Quarkus code, not just Hibernate.
  • Repetitive datasource configuration is minimized.
  • Things other than just the catalog and schema name can be configured per-tenant (e.g. the user id above).

Now, an open question in my mind is whether there is a real need to distinguish the schema and database strategies, since AFAIK most databases let you select a schema via JDBC connection properties. However, I must admit I'm not certain about this at all, and so perhaps there's some reason to have this as a whole different strategy.

@gavinking
Copy link
Author

@Sanne I'm still waiting to hear your take on all this.

@gavinking
Copy link
Author

gavinking commented Sep 6, 2020

Now, an open question in my mind is whether there is a real need to distinguish the schema and database strategies, since AFAIK most databases let you select a schema via JDBC connection properties.

Well, even if there is a good reason to use setSchema() instead of the JDBC connection properties, we can still abstract that as a property of the datasource via .schema (and for completeness .catalog) properties:

# Default tenant 'base'
quarkus.datasource.base.db-kind=postgresql
quarkus.datasource.base.username=quarkus_test
quarkus.datasource.base.password=quarkus_test
quarkus.datasource.base.jdbc.url=jdbc:postgresql://localhost:5432/quarkus_test

# Tenant 'mycompany'
quarkus.datasource.mycompany.inherit=base
quarkus.datasource.mycompany.schema=mycompany

@michael-schnell
Copy link
Contributor

Now, an open question in my mind is whether there is a real need to distinguish the schema and database strategies, since AFAIK most databases let you select a schema via JDBC connection properties.

Well, even if there is a good reason to use setSchema() instead of the JDBC connection properties, we can still abstract that as a property of the datasource via .schema (and for completeness .catalog) properties:

# Default tenant 'base'
quarkus.datasource.base.db-kind=postgresql
quarkus.datasource.base.username=quarkus_test
quarkus.datasource.base.password=quarkus_test
quarkus.datasource.base.jdbc.url=jdbc:postgresql://localhost:5432/quarkus_test

# Tenant 'mycompany'
quarkus.datasource.mycompany.inherit=base
quarkus.datasource.mycompany.schema=mycompany

If I remember it correctly, "setSchema(..)" didn't work either on MariaDB or on PostgreSQL. Some databases like MariaDB/MySQL do not support database schemas. In these cases you have to use the DATABASE approach.

@michael-schnell
Copy link
Contributor

So I've done a bit more research into how this stuff has been done in Quarkus, and here's a more concrete proposal to kick about.

  1. Introduce a parent/child relationship between "base" datasources and "tenant" datasources.
  2. Tenant datasources inherit the configuration of the base datasource, except for any setting they explicitly override.
  3. The base datasource uses the current tenant identifier (if it is set) to forward connection requests to the corresponding tenant datasource.
  4. Remove the Hibernate-specific code.

Now, configuring a multitenant datasource in Quarkus would look like this:

# Default tenant 'base'
quarkus.datasource.base.db-kind=postgresql
quarkus.datasource.base.username=quarkus_test
quarkus.datasource.base.password=quarkus_test
quarkus.datasource.base.jdbc.url=jdbc:postgresql://localhost:5432/quarkus_test

# Tenant 'mycompany'
quarkus.datasource.mycompany.inherit=base
quarkus.datasource.mycompany.jdbc.url=jdbc:postgresql://localhost:5433/mycompany

Or:

# Default tenant 'base'
quarkus.datasource.base.db-kind=postgresql
quarkus.datasource.base.username=quarkus_test
quarkus.datasource.base.password=quarkus_test
quarkus.datasource.base.jdbc.url=jdbc:postgresql://localhost:5432/quarkus_test

# Tenant 'mycompany'
quarkus.datasource.mycompany.inherit=base
quarkus.datasource.mycompany.username=mycompany
quarkus.datasource.mycompany.password=secret

The advantages of this approach include:

  • Multitenancy is available as a feature to all Quarkus code, not just Hibernate.
  • Repetitive datasource configuration is minimized.
  • Things other than just the catalog and schema name can be configured per-tenant (e.g. the user id above).

Now, an open question in my mind is whether there is a real need to distinguish the schema and database strategies, since AFAIK most databases let you select a schema via JDBC connection properties. However, I must admit I'm not certain about this at all, and so perhaps there's some reason to have this as a whole different strategy.

Sounds like a good approach, but seems mainly to optimize the configuration setup.

How do you tell Hibernate which is the currently selected tenant connection/datasource to use? Most likely this is connected to a request scope.

@gavinking
Copy link
Author

gavinking commented Sep 6, 2020

@michael-schnell

How do you tell Hibernate which is the currently selected tenant connection/datasource to use?

Well you don't need to. That's the whole idea. You just point Hibernate at the base datasource which takes care of redirecting the connection requests to the correct contextual tenant datasource, just as it would do for any other client.

Hibernate multitenancy doesn't need to be involved in this at all.

@gavinking
Copy link
Author

gavinking commented Sep 6, 2020

@michael-schnell

If I remember it correctly, "setSchema(..)" didn't work either on MariaDB or on PostgreSQL.

Eyeballing the code of the Postgres JDBC driver, I believe Postgres does support setSchema(), but no, of course, MySQL (and derivative MariaDB) don't have schemas, so they don't support it.

@venki157
Copy link

Anything concluded on this multitenency issue. When can we expect this fix?

@DorianMaliszewski
Copy link

Refer to this issue please : #11949

@michael-schnell
Copy link
Contributor

I would love to do it, but at the moment I have no time. Hopefully there is some spare time during the Christmas holidays. As suggested by Gavin the idea is to move the functionality out of Hibernate extension into the datasources extension. Definitely I will also need support from the datasources maintainers for this task.

@ltsallas
Copy link

ltsallas commented Mar 4, 2022

Other than the shared database, different schema multitenancy, there can be a shared database/shared schema approach with Postgres Row Level Security and a session parameter as discussed in the "Alternative approach" section in this article

I made a question on stackoverflow asking for a Quarkus solution for this case - I didn't find a straight forward way to do so. I am just sharing it here since the - outside hibernate- multitenant datasources solution you are discussing so far might take into account the shared db & schema multitenancy as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants