Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query Frontend: forward tenant information downstream #6595

Merged
merged 29 commits into from
Oct 25, 2023

Conversation

douglascamata
Copy link
Contributor

@douglascamata douglascamata commented Aug 8, 2023

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

It's not relevant to end users (yet) because it has no impact, just like #6530.

Changes

This PR continues the work from #6530, which started to forward tenant information down from Querier to Store API endpoints.

Here the tenant information is propagated from the Query Frontend to the Queriers. These are the change done by this PR:

  • Query Frontend has now 3 extra configuration options regarding tenancy, which are matching Receive's and Querier's:
    • query-frontend.tenant-header: indicates name of the tenant header, defaults to THANOS-TENANT.
      When this option is provided at the same time as query-frontend.org-id-header: an error is thrown, the Query Frontend won't start, the user will then be warned to stop using the org id header configuration and migrate to the tenant header one.
    • query-frontend.default-tenant-id: indicates the default tenant ID, defaults to default-tenant. This is the tenant that will be used when the request doesn't specify one.
    • query-frontend.tenant-certificate-field: use the TLS client's certificate field to identify the tenant.
  • When a tenant is identified using the previously mentioned logic, it's forwarded to Queriers using the internal tenant header (THANOS-TENANT).
    • The tenancy.GetTenantFromHTTP helper has been updated to match this behavior.
    • The X-Scope-Orgid header from Cortex is kept for compatibility with request middlewares and tripperwares imported from Cortex' queryfrontend package. If and/or when Cortex allows the org id header name to be customized, the duplication can be removed in favor of properly configuring Cortex.

Verification

Added e2e tests and did manual tests.

Signed-off-by: Douglas Camata <[email protected]>
* Add tenant forwarding based on tenant cert.
* Fix bugs in the header management.

Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>
@douglascamata douglascamata marked this pull request as ready for review August 8, 2023 15:08
Signed-off-by: Douglas Camata <[email protected]>
@@ -217,6 +223,13 @@ func runQueryFrontend(
cfg *queryFrontendConfig,
comp component.Component,
) error {
cfg.ForwardHeaders = append(cfg.ForwardHeaders, tenancy.DefaultTenantHeader)
cfg.orgIdHeaders = append(cfg.orgIdHeaders, tenancy.DefaultTenantHeader)
if cfg.TenantHeader != "" && cfg.TenantHeader != tenancy.DefaultTenantHeader {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be an if else? or is it a case that we always want to forward the default tenant header since it is appended above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to use only if with a good initialization instead of if else. Makes it easier to read: "we always forward the default tenant header and if there's a custom one, we forward it too".

Buuuut, I think I can remove cfg.ForwardHeaders = append(cfg.ForwardHeaders, cfg.TenantHeader), as a custom tenant header doesn't need to be forwarded as-is downstream. It'll be converted to the internal one (the default), before being forwarded.

But I still have to add it to the list of orgIdHeaders.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, don't disagree, was just wondering if it was a case of always passing on the default but that answers it.

tenant = r.Header.Get(DefaultTenantHeader)
if tenant == "" {
tenant = defaultTenantID
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think its worth logging down below if we end up overwriting based on the cert? or maybe we could just handle this on startup with a log or an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be too spammy. We had similar logs when parsing the tenant information from the request in the Store GW and they were removed: 510c054

We should always and only forward the default (internal) one.

Signed-off-by: Douglas Camata <[email protected]>
@@ -146,6 +147,10 @@ func registerQueryFrontend(app *extkingpin.App) {

cmd.Flag("query-frontend.forward-header", "List of headers forwarded by the query-frontend to downstream queriers, default is empty").PlaceHolder("<http-header-name>").StringsVar(&cfg.ForwardHeaders)

cmd.Flag("query-frontend.tenant-header", "HTTP header to determine tenant").Default(tenancy.DefaultTenantHeader).Hidden().StringVar(&cfg.TenantHeader)
cmd.Flag("query-frontend.default-tenant", "Default tenant to use if tenant header is not present").Default(tenancy.DefaultTenant).Hidden().StringVar(&cfg.DefaultTenant)
cmd.Flag("query-frontend.tenant-certificate-field", "Use TLS client's certificate field to determine tenant for requests. Must be one of "+tenancy.CertificateFieldOrganization+", "+tenancy.CertificateFieldOrganizationalUnit+" or "+tenancy.CertificateFieldCommonName+". This setting will cause the query-frontend.tenant-header flag value to be ignored.").Default("").EnumVar(&cfg.TenantCertField, "", tenancy.CertificateFieldOrganization, tenancy.CertificateFieldOrganizationalUnit, tenancy.CertificateFieldCommonName)
Copy link
Contributor

@jacobbaungard jacobbaungard Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should probably be hidden as the other two, until this functionality is ready.

@@ -146,6 +147,10 @@ func registerQueryFrontend(app *extkingpin.App) {

cmd.Flag("query-frontend.forward-header", "List of headers forwarded by the query-frontend to downstream queriers, default is empty").PlaceHolder("<http-header-name>").StringsVar(&cfg.ForwardHeaders)

cmd.Flag("query-frontend.tenant-header", "HTTP header to determine tenant").Default(tenancy.DefaultTenantHeader).Hidden().StringVar(&cfg.TenantHeader)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is the right place to discuss this, but the query frontend also has a query-frontend.org-id-header arg. Thanos side, I believe this is only used when logging slow queries, but probably has some more use for Cortex.

Do we want to keep both the org-id-header and the tenant-header long term ? Could perhaps be nice to unify that into one cmd line arg if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long term we should indeed converge on only one to avoid confusion for users and ourselves, and for consistency.

As part of the migration process, I can mark the query-frontend.org-id-header as deprecated for now, mention the new one, and store the configured org-id-header in the tenant header config. This way someone already using this will have a warning about the change and can easily adapt to the new CLI arg, allowing us to completely drop the org ID config in a future breaking change.

Internally and more technically speaking, I am copying the tenant header field's value to the Org ID one because of some middlewares we are importing/reusing from Cortex, but I think over time we can adapt them to avoid this and clean things up.

What do you think, @jacobbaungard?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good plan to me. What would happen if one define both, can we error out in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacobbaungard we could error out or print out a warning & give priority to the tenant header config, similar to what happens if you configure the header and the cert field in Receive. I personally have no preference. Will update to error out when both are provided. 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think the orgID was introduced only as a method of identifying sources of slow queries and passing that information to Cortex. It's also a repeatable flag, with significance in how headers are ordered, afaiu reading original issue/PR.

Thus I don't think it is mutually exclusive with this tenant header, and both have different semantics as well, as you can have multiple orgID headers, but only a single tenant one. So let's not error or warn the user, and keep it for now.

cc: @yeya24 in case he has some other thoughts on this! 🙂

@@ -217,6 +224,19 @@ func runQueryFrontend(
cfg *queryFrontendConfig,
comp component.Component,
) error {
if len(cfg.orgIdHeaders) != 0 && cfg.TenantHeader != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, won't this always evaluate true when setting the orgIdHeader, as the TenantHeader has a non-empty default value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hummmm, true. I'm starting to think it will be much simpler to output a warning and give priority to the tenant header configuration. 🤔

Let me see what I can do!

Signed-off-by: Douglas Camata <[email protected]>
Copy link
Contributor

@jacobbaungard jacobbaungard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, looks good to me!

@moadz
Copy link
Contributor

moadz commented Sep 27, 2023

cc: @yeya24 as a reviewer with some cortex knowledge :)

queriesCount *prometheus.CounterVec
queriesCount *prometheus.CounterVec
tenantHeader string
defaultTenant string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the two new fields used in the roundTripper struct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, @yeya24. I think they are some leftovers of previous approaches I tried. I'm removing them.

@douglascamata
Copy link
Contributor Author

@yeya24 took care of the comment and rebased, please have another look when you have some time. Thanks! 🙇

saswatamcode
saswatamcode previously approved these changes Oct 5, 2023
Copy link
Member

@saswatamcode saswatamcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work! Generally LGTM, but cc: @yeya24 on the point of orgID and tenant headers.

@@ -146,6 +147,10 @@ func registerQueryFrontend(app *extkingpin.App) {

cmd.Flag("query-frontend.forward-header", "List of headers forwarded by the query-frontend to downstream queriers, default is empty").PlaceHolder("<http-header-name>").StringsVar(&cfg.ForwardHeaders)

cmd.Flag("query-frontend.tenant-header", "HTTP header to determine tenant").Default(tenancy.DefaultTenantHeader).Hidden().StringVar(&cfg.TenantHeader)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think the orgID was introduced only as a method of identifying sources of slow queries and passing that information to Cortex. It's also a repeatable flag, with significance in how headers are ordered, afaiu reading original issue/PR.

Thus I don't think it is mutually exclusive with this tenant header, and both have different semantics as well, as you can have multiple orgID headers, but only a single tenant one. So let's not error or warn the user, and keep it for now.

cc: @yeya24 in case he has some other thoughts on this! 🙂

@douglascamata
Copy link
Contributor Author

@saswatamcode from the Thanos perspective, the headers do the same thing. In theory, you can send multiple values for any HTTP header (internally in the stdlib they are map[string][]string), we only decide to ignore anything being the 1st value for now because we're not implementing anything cross-tenant related (yet).

HTTP headers ordering is not a concern we need to have, stdlib takes care of this.

I could have completely removed OrgID from the codebase and keep the slow query log working fine with the tenant ID. Only didn't do it because I wanted this PR to be small. Eventually, we can get rid of this concept of "OrgID" in Thanos, as in this context we talk about tenants and not "orgs".

Note that nothing here makes Thanos incompatible with Cortex and vice-versa. On this context. the only change is the rename of the cli arg from query-frontend.org-id-header to query-frontend.tenant-header. All behavior should be preserved when moving between them.

@douglascamata
Copy link
Contributor Author

Even added this in the feature's e2e test to ensure the OrgID header keeps being passed downstream: https://github.com/thanos-io/thanos/pull/6595/files#diff-e84e75cb190ebf54e47774afa03e68ed6cbc3f03e6216ae31e79c58509b0acbeR984-R985

Signed-off-by: Douglas Camata <[email protected]>
Copy link
Member

@saswatamcode saswatamcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I think this is good to go.

For the orgID stuff, I believe it's there as a separate header due to certain cortex feature, but we can make it more in-line with Thanos tenancy model or remove it completely as Thanos has its own slow query log. 🙂

@saswatamcode saswatamcode enabled auto-merge (squash) October 25, 2023 03:19
@saswatamcode saswatamcode merged commit df48504 into thanos-io:main Oct 25, 2023
16 checks passed
douglascamata added a commit to douglascamata/thanos that referenced this pull request Dec 13, 2023
* Make query frontend forward tenant info downstream

Signed-off-by: Douglas Camata <[email protected]>

* Add TODO for client certificate

Signed-off-by: Douglas Camata <[email protected]>

* Finish qfe tenant forward features

* Add tenant forwarding based on tenant cert.
* Fix bugs in the header management.

Signed-off-by: Douglas Camata <[email protected]>

* Test more scenarios

Signed-off-by: Douglas Camata <[email protected]>

* Goimports files

Signed-off-by: Douglas Camata <[email protected]>

* Rerun CI

Signed-off-by: Douglas Camata <[email protected]>

* Rerun goimports without any line breaks

Signed-off-by: Douglas Camata <[email protected]>

* Resort imports in groups (std, 3rd-party, local)

Signed-off-by: Douglas Camata <[email protected]>

* No need to forward a custom tenant header downstream

We should always and only forward the default (internal) one.

Signed-off-by: Douglas Camata <[email protected]>

* Make query frontend tenant cert config hidden

Signed-off-by: Douglas Camata <[email protected]>

* Error out when org ig and tenant header are provided at the same time

Signed-off-by: Douglas Camata <[email protected]>

* Fix typo

Signed-off-by: Douglas Camata <[email protected]>

* Update docs

Signed-off-by: Douglas Camata <[email protected]>

* Improve comments around forward and org id headers for tenancy

Signed-off-by: Douglas Camata <[email protected]>

* Add some test comments

Signed-off-by: Douglas Camata <[email protected]>

* Improve logic for detecting tenant and org id headers were provided

Signed-off-by: Douglas Camata <[email protected]>

* Fix order of check for org id headers

Signed-off-by: Douglas Camata <[email protected]>

* Add some TODO comments for the future

Signed-off-by: Douglas Camata <[email protected]>

* Rerun CI

Signed-off-by: Douglas Camata <[email protected]>

* Make default tenant flag in qfe match receive's

Signed-off-by: Douglas Camata <[email protected]>

* Remove unnecessary struct fields

Signed-off-by: Douglas Camata <[email protected]>

* Add instant query to tenant forward tests

Signed-off-by: Douglas Camata <[email protected]>

* Rerun build

Signed-off-by: Douglas Camata <[email protected]>

---------

Signed-off-by: Douglas Camata <[email protected]>
douglascamata added a commit to douglascamata/thanos that referenced this pull request Dec 13, 2023
* Make query frontend forward tenant info downstream

Signed-off-by: Douglas Camata <[email protected]>

* Add TODO for client certificate

Signed-off-by: Douglas Camata <[email protected]>

* Finish qfe tenant forward features

* Add tenant forwarding based on tenant cert.
* Fix bugs in the header management.

Signed-off-by: Douglas Camata <[email protected]>

* Test more scenarios

Signed-off-by: Douglas Camata <[email protected]>

* Goimports files

Signed-off-by: Douglas Camata <[email protected]>

* Rerun CI

Signed-off-by: Douglas Camata <[email protected]>

* Rerun goimports without any line breaks

Signed-off-by: Douglas Camata <[email protected]>

* Resort imports in groups (std, 3rd-party, local)

Signed-off-by: Douglas Camata <[email protected]>

* No need to forward a custom tenant header downstream

We should always and only forward the default (internal) one.

Signed-off-by: Douglas Camata <[email protected]>

* Make query frontend tenant cert config hidden

Signed-off-by: Douglas Camata <[email protected]>

* Error out when org ig and tenant header are provided at the same time

Signed-off-by: Douglas Camata <[email protected]>

* Fix typo

Signed-off-by: Douglas Camata <[email protected]>

* Update docs

Signed-off-by: Douglas Camata <[email protected]>

* Improve comments around forward and org id headers for tenancy

Signed-off-by: Douglas Camata <[email protected]>

* Add some test comments

Signed-off-by: Douglas Camata <[email protected]>

* Improve logic for detecting tenant and org id headers were provided

Signed-off-by: Douglas Camata <[email protected]>

* Fix order of check for org id headers

Signed-off-by: Douglas Camata <[email protected]>

* Add some TODO comments for the future

Signed-off-by: Douglas Camata <[email protected]>

* Rerun CI

Signed-off-by: Douglas Camata <[email protected]>

* Make default tenant flag in qfe match receive's

Signed-off-by: Douglas Camata <[email protected]>

* Remove unnecessary struct fields

Signed-off-by: Douglas Camata <[email protected]>

* Add instant query to tenant forward tests

Signed-off-by: Douglas Camata <[email protected]>

* Rerun build

Signed-off-by: Douglas Camata <[email protected]>

---------

Signed-off-by: Douglas Camata <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants