Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Prombench Tests #321

Closed
geekodour opened this issue Feb 4, 2020 · 14 comments
Closed

Custom Prombench Tests #321

geekodour opened this issue Feb 4, 2020 · 14 comments

Comments

@geekodour
Copy link
Member

The current prombench setup is rigid, recently got some interest in running custom prombench tests, especially when the current prombench setup is running on low memory. (prometheus/prometheus#6729).

I suggest having something like this:

/prombench v2.4.0 <github_repo_url> <branch_name>

This way prombench users will be able to fork prombench, add their custom changes to a branch and prombench infrastructure will use that branch when running the test. Additionally for cases like requiring high memory, we can have some template branch in the prombench repo itself.

Please let me know what you think.

cc:
@codesome with prometheus/prometheus#6679
@csmarchbanks #249
@cstyan prometheus/prometheus#6729
@krasi-georgiev

@krasi-georgiev
Copy link
Contributor

krasi-georgiev commented Feb 4, 2020

it is just one extra step to open a PR from your branch and run it as usual, no?
So unless there are any other benefits don't think it is worth spending anytime on this right now.
In the past we have discussed the option if someone wants to run bench tests on their own GKE account and than it would be beneficial to allow custom repos, but currently noone has requested this so best to spend the time on improving the dashboards and other more urgent fixes.

@cstyan
Copy link
Member

cstyan commented Feb 4, 2020

I think personal forks and branches is a bit too much to ask. Can we provide a set of additional subcommands and config options to the prombench command? I'm sure Chris and I would be happy to help with a remote write test for prombench.

@krasi-georgiev
Copy link
Contributor

Yep more flexibility in the prombench configs would be an amazing addition. @cstyan if you have the time to look into this that would be great.

I guess a proposal doc for the implementation could be the way to dicsuss this?

@cstyan
Copy link
Member

cstyan commented Feb 4, 2020

I can help with something immediate to get over the memory issues so we can properly benchmark 2.16.0-rc.0, anything longer term I wouldn't be able to get to for a few weeks at least.

@geekodour what would you suggest for a short term fix?

@geekodour
Copy link
Member Author

@cstyan A benchmark is already running for prometheus/prometheus#6729

http://prombench.prometheus.io/grafana/d/7gmLoNDmz/prombench?orgId=1&var-RuleGroup=All&var-pr-number=6729&from=now-12h&to=now&refresh=30s

The last one oomed at 8h, this one is still running 10h in. So the short-term fix would be to rerun it(which is running now) and to hope it does not crash again 😞 because we reduced the memory. Otherwise, we can revert the following to n1-highmem-16 but it does not solve the billing issue. So I thought having a way to specify a separate branch of prombench when running tests will be useful for some specific cases.

https://github.com/prometheus/prombench/blob/8681448850bdc38dcee93d43513cacda710a4d6e/prombench/manifests/prombench/nodepools.yaml#L10

@geekodour
Copy link
Member Author

@cstyan if this fails, we can maybe start another test with n1-highmem-16 which will make sure that atleast the test finishes.

@geekodour
Copy link
Member Author

For the long term resolution of the memory issue, I think we can try reducing the no. of series if we do not intend to increase the memory of the instances to prevent any kind of out of memory related to this in the future. ?

@cstyan
Copy link
Member

cstyan commented Feb 4, 2020

Did the prombench cancel on that PR not work?

Otherwise, we can revert the following to n1-highmem-16 but it does not solve the billing issue.

I think we should do this, just for this next 3d benchmark, so we can confirm 2.16.0 isn't performing worse in terms of memory usage than 2.15.x. Then going forward we could reduce the number of series and go back to n1-highmem-8 and do another benchmark so we have something to compare the future 2.17.0 benchmark to.

What do you guys think about that?

@geekodour
Copy link
Member Author

@cstyan yes, it did work. I started another one manually to inspect the failing behavior again since grafana was not showing previous Loki logs. (#322)

we can manually start a test with n1-highmem-16 then we won't have to create another PR for that. @krasi-georgiev please let me know what to do.

@krasi-georgiev
Copy link
Contributor

I would say decrease the num of fake servers to decreaes the number of ingested series.

@geekodour
Copy link
Member Author

Running it in n1-highmem-8 crashed again,

link to loki logs
link to prombench dashboard

Why does the time series count get very high when restarting prometheus though?

@cstyan I am starting a new test for 2.16.0-rc.0 with less number of fake-webservers as @krasi-georgiev suggested. will that be fine?

@krasi-georgiev
Copy link
Contributor

It is clearly visible that the new Prom version uses more memory and does a lot more allocations, but yeah lets reduce the num or targets to see how it goes.

@geekodour
Copy link
Member Author

@cstyan @krasi-georgiev i started a test for v2.16.0 rc.0 as #323 got merged.

@geekodour
Copy link
Member Author

Closing since the memory issue was resolved and there is not a requirement of such custom tests as of now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants