Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Chore] Update Prometheus #7416

Merged
merged 8 commits into from
Jun 10, 2024
Merged

[Chore] Update Prometheus #7416

merged 8 commits into from
Jun 10, 2024

Conversation

alanprot
Copy link
Contributor

@alanprot alanprot commented Jun 4, 2024

Changes

We pinned github.com/sercand/kuberesolver/v4 => github.com/sercand/kuberesolver/v5 v5.1.1 in order to be able to update google.golang.org/grpc needed by opentelemetry-go (and consequently removed the replace for google.golang.org/grpc)

Breaking Changes on Prometheus

Verification

Breaking Changes on the new go-grpc:

  • From here FailOnNonTempDialError, WithBlock, and WithReturnConnectionError are three DialOptions that are only supported by Dial because they only affect the behavior of Dial itself. WithBlock causes Dial to wait until the ClientConn reports its State as connectivity.Connected. The other two deal with returning connection errors before the timeout (WithTimeout or on the context when using DialContext).
    • Those options were added to work around an regression on GRPC: see this and this

@alanprot alanprot changed the title update thanos [Chore] Update Prometheus Jun 4, 2024
@alanprot alanprot force-pushed the update-prometheus branch 3 times, most recently from 3c55ace to 6c34cfa Compare June 4, 2024 23:31
@alanprot alanprot marked this pull request as ready for review June 4, 2024 23:53
@yeya24
Copy link
Contributor

yeya24 commented Jun 5, 2024

@alanprot OK I just checked and I think the E2E test failure is related. The test tries to spin up a Prometheus locally using Prometheus binary.

I logged the command in my local path.

prometheus --storage.tsdb.retention=2d --storage.tsdb.path=/var/folders/rd/vh0cfrk13jsf8pnnqgz9yrg40000gr/T/prometheus-test3289092244 --web.listen-address=localhost:54196 --web.route-prefix= --web.enable-admin-api --config.file=/var/folders/rd/vh0cfrk13jsf8pnnqgz9yrg40000gr/T/prometheus-test3289092244/prometheus.yml

And it failed with error below.

ts=2024-06-05T06:07:29.433Z caller=main.go:521 level=error msg="Error loading config (--config.file=/var/folders/rd/vh0cfrk13jsf8pnnqgz9yrg40000gr/T/prometheus-test3289092244/prometheus.yml)" file=/var/folders/rd/vh0cfrk13jsf8pnnqgz9yrg40000gr/T/prometheus-test3289092244/prometheus.yml err="parsing YAML file /var/folders/rd/vh0cfrk13jsf8pnnqgz9yrg40000gr/T/prometheus-test3289092244/prometheus.yml: yaml: unmarshal errors:\n  line 2: field rule_query_offset not found in type config.plain"

The config is:

global:
    rule_query_offset: 0s
    external_labels:
        az: "1"
        region: eu-west

rule_query_offset is a new config and it doesn't have omit_empty so it makes old Prometheus version fail. https://github.com/prometheus/prometheus/blob/main/config/config.go#L402.

I think what we can do is, rather than using global config to write the Prometheus config file here, we can write with plain yaml similar to this

@yeya24
Copy link
Contributor

yeya24 commented Jun 5, 2024

Let's update Prometheus version to prometheus/prometheus#14216 and see if it fixes it?

@alanprot
Copy link
Contributor Author

alanprot commented Jun 5, 2024

Let's update Prometheus version to prometheus/prometheus#14216 and see if it fixes it?

Will do.. but i think there is another braking change that we will need to PR on prometheus.. I will push the change with the new prometheus for now but the build will fail.

prometheus/common#538 (comment)

I pinned the common to work around this issue.

@alanprot alanprot force-pushed the update-prometheus branch from 018be5d to f4ca9bc Compare June 5, 2024 18:21
@alanprot
Copy link
Contributor Author

alanprot commented Jun 5, 2024

=== NAME  TestRangeQueryShardingWithRandomData
    testutil.go:91: query_frontend_test.go:667: ""
        
        	exp: model.Matrix(nil)
        
        	got: model.Matrix{}
        
        Diff

Seems related to: prometheus/prometheus#13993

=/
Need to understand why.

@alanprot alanprot force-pushed the update-prometheus branch 2 times, most recently from 472b5d7 to bf38079 Compare June 5, 2024 21:20
@alanprot
Copy link
Contributor Author

alanprot commented Jun 5, 2024

Ok..

We are not using jsoniter to serialize response as the upstream prometheus is... it explains why the null is being serialized as null as the fix for prometheus/prometheus#13993 was actually done prometheus/prometheus#13997 and it was a fix on the json serializer...

I'm changing thanos to use jsoniter as it has multiple benefits anyway.

@alanprot alanprot force-pushed the update-prometheus branch 3 times, most recently from bc1338a to 15f804e Compare June 5, 2024 22:05
@alanprot alanprot force-pushed the update-prometheus branch from e930f77 to 2ea9fbd Compare June 6, 2024 16:06
@alanprot alanprot requested a review from yeya24 June 6, 2024 21:23
yeya24
yeya24 previously approved these changes Jun 7, 2024
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@alanprot
Copy link
Contributor Author

alanprot commented Jun 7, 2024

@GiedriusS @fpetkovski Can you guys take a peek on this one?

@fpetkovski
Copy link
Contributor

We are seeing problems with grpc 1.64.0: grpc/grpc-go#7314. The issue in Thanos happens when we try to close stale connections here: https://github.com/thanos-io/thanos/blob/main/pkg/query/endpointset.go#L442.

Maybe it's better to pin grpc to 1.63.2?

@alanprot
Copy link
Contributor Author

We are seeing problems with grpc 1.64.0: grpc/grpc-go#7314. The issue in Thanos happens when we try to close stale connections here: https://github.com/thanos-io/thanos/blob/main/pkg/query/endpointset.go#L442.

Maybe it's better to pin grpc to 1.63.2?

Ok.. lemme do this.

yeya24
yeya24 previously approved these changes Jun 10, 2024
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Merge on green

@alanprot alanprot force-pushed the update-prometheus branch from e674b2a to 02d1814 Compare June 10, 2024 20:47
@yeya24 yeya24 enabled auto-merge (squash) June 10, 2024 20:54
Signed-off-by: alanprot <[email protected]>
auto-merge was automatically disabled June 10, 2024 21:31

Head branch was pushed to by a user without write access

@alanprot alanprot force-pushed the update-prometheus branch from 02d1814 to d45eac6 Compare June 10, 2024 21:31
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yeya24 yeya24 merged commit 882d6a1 into thanos-io:main Jun 10, 2024
20 checks passed
hczhu-db pushed a commit to databricks/thanos that referenced this pull request Aug 22, 2024
* Uupdate Prometheus

Signed-off-by: alanprot <[email protected]>

* Updating prometheus to 4e664035e84e

Signed-off-by: alanprot <[email protected]>

* Temporarily pinning prometheus common

Signed-off-by: alanprot <[email protected]>

* fixing lint

Signed-off-by: alanprot <[email protected]>

* Using jsoniter to encode promql responses

Signed-off-by: alanprot <[email protected]>

* Removing e2e test case with unvalid hifen on a matcher -> prometheus now support this use case

Signed-off-by: alanprot <[email protected]>

* Updating prometheus to v0.52.2-0.20240606174736-edd558884b24

Signed-off-by: alanprot <[email protected]>

* pinning grpc to v1.63.2

Signed-off-by: alanprot <[email protected]>

---------

Signed-off-by: alanprot <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
hczhu-db pushed a commit to databricks/thanos that referenced this pull request Aug 22, 2024
* Uupdate Prometheus

Signed-off-by: alanprot <[email protected]>

* Updating prometheus to 4e664035e84e

Signed-off-by: alanprot <[email protected]>

* Temporarily pinning prometheus common

Signed-off-by: alanprot <[email protected]>

* fixing lint

Signed-off-by: alanprot <[email protected]>

* Using jsoniter to encode promql responses

Signed-off-by: alanprot <[email protected]>

* Removing e2e test case with unvalid hifen on a matcher -> prometheus now support this use case

Signed-off-by: alanprot <[email protected]>

* Updating prometheus to v0.52.2-0.20240606174736-edd558884b24

Signed-off-by: alanprot <[email protected]>

* pinning grpc to v1.63.2

Signed-off-by: alanprot <[email protected]>

---------

Signed-off-by: alanprot <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
jnyi pushed a commit to jnyi/thanos that referenced this pull request Oct 17, 2024
* Uupdate Prometheus

Signed-off-by: alanprot <[email protected]>

* Updating prometheus to 4e664035e84e

Signed-off-by: alanprot <[email protected]>

* Temporarily pinning prometheus common

Signed-off-by: alanprot <[email protected]>

* fixing lint

Signed-off-by: alanprot <[email protected]>

* Using jsoniter to encode promql responses

Signed-off-by: alanprot <[email protected]>

* Removing e2e test case with unvalid hifen on a matcher -> prometheus now support this use case

Signed-off-by: alanprot <[email protected]>

* Updating prometheus to v0.52.2-0.20240606174736-edd558884b24

Signed-off-by: alanprot <[email protected]>

* pinning grpc to v1.63.2

Signed-off-by: alanprot <[email protected]>

---------

Signed-off-by: alanprot <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants