Custom Prombench Tests #321

geekodour · 2020-02-04T03:55:14Z

The current prombench setup is rigid, recently got some interest in running custom prombench tests, especially when the current prombench setup is running on low memory. (prometheus/prometheus#6729).

I suggest having something like this:

/prombench v2.4.0 <github_repo_url> <branch_name>

This way prombench users will be able to fork prombench, add their custom changes to a branch and prombench infrastructure will use that branch when running the test. Additionally for cases like requiring high memory, we can have some template branch in the prombench repo itself.

Please let me know what you think.

cc:
@codesome with prometheus/prometheus#6679
@csmarchbanks #249
@cstyan prometheus/prometheus#6729
@krasi-georgiev

The text was updated successfully, but these errors were encountered:

krasi-georgiev · 2020-02-04T13:38:41Z

it is just one extra step to open a PR from your branch and run it as usual, no?
So unless there are any other benefits don't think it is worth spending anytime on this right now.
In the past we have discussed the option if someone wants to run bench tests on their own GKE account and than it would be beneficial to allow custom repos, but currently noone has requested this so best to spend the time on improving the dashboards and other more urgent fixes.

cstyan · 2020-02-04T15:42:19Z

I think personal forks and branches is a bit too much to ask. Can we provide a set of additional subcommands and config options to the prombench command? I'm sure Chris and I would be happy to help with a remote write test for prombench.

krasi-georgiev · 2020-02-04T15:49:54Z

Yep more flexibility in the prombench configs would be an amazing addition. @cstyan if you have the time to look into this that would be great.

I guess a proposal doc for the implementation could be the way to dicsuss this?

cstyan · 2020-02-04T16:03:30Z

I can help with something immediate to get over the memory issues so we can properly benchmark 2.16.0-rc.0, anything longer term I wouldn't be able to get to for a few weeks at least.

@geekodour what would you suggest for a short term fix?

geekodour · 2020-02-04T16:11:52Z

@cstyan A benchmark is already running for prometheus/prometheus#6729

http://prombench.prometheus.io/grafana/d/7gmLoNDmz/prombench?orgId=1&var-RuleGroup=All&var-pr-number=6729&from=now-12h&to=now&refresh=30s

The last one oomed at 8h, this one is still running 10h in. So the short-term fix would be to rerun it(which is running now) and to hope it does not crash again 😞 because we reduced the memory. Otherwise, we can revert the following to n1-highmem-16 but it does not solve the billing issue. So I thought having a way to specify a separate branch of prombench when running tests will be useful for some specific cases.

https://github.com/prometheus/prombench/blob/8681448850bdc38dcee93d43513cacda710a4d6e/prombench/manifests/prombench/nodepools.yaml#L10

geekodour · 2020-02-04T16:14:17Z

@cstyan if this fails, we can maybe start another test with n1-highmem-16 which will make sure that atleast the test finishes.

geekodour · 2020-02-04T16:24:45Z

For the long term resolution of the memory issue, I think we can try reducing the no. of series if we do not intend to increase the memory of the instances to prevent any kind of out of memory related to this in the future. ?

cstyan · 2020-02-04T16:33:08Z

Did the prombench cancel on that PR not work?

Otherwise, we can revert the following to n1-highmem-16 but it does not solve the billing issue.

I think we should do this, just for this next 3d benchmark, so we can confirm 2.16.0 isn't performing worse in terms of memory usage than 2.15.x. Then going forward we could reduce the number of series and go back to n1-highmem-8 and do another benchmark so we have something to compare the future 2.17.0 benchmark to.

What do you guys think about that?

geekodour · 2020-02-04T16:47:21Z

@cstyan yes, it did work. I started another one manually to inspect the failing behavior again since grafana was not showing previous Loki logs. (#322)

we can manually start a test with n1-highmem-16 then we won't have to create another PR for that. @krasi-georgiev please let me know what to do.

krasi-georgiev · 2020-02-04T22:27:54Z

I would say decrease the num of fake servers to decreaes the number of ingested series.

geekodour · 2020-02-05T09:31:40Z

Running it in n1-highmem-8 crashed again,

link to loki logs
link to prombench dashboard

Why does the time series count get very high when restarting prometheus though?

@cstyan I am starting a new test for 2.16.0-rc.0 with less number of fake-webservers as @krasi-georgiev suggested. will that be fine?

krasi-georgiev · 2020-02-05T10:17:54Z

It is clearly visible that the new Prom version uses more memory and does a lot more allocations, but yeah lets reduce the num or targets to see how it goes.

geekodour · 2020-02-05T12:43:43Z

@cstyan @krasi-georgiev i started a test for v2.16.0 rc.0 as #323 got merged.

geekodour · 2020-04-30T13:57:23Z

Closing since the memory issue was resolved and there is not a requirement of such custom tests as of now.

geekodour mentioned this issue Feb 4, 2020

cut v2.16.0 rc.0 pre-release prometheus/prometheus#6729

Merged

geekodour mentioned this issue Feb 5, 2020

Decrease fake-webserver count #323

Merged

geekodour closed this as completed Apr 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Prombench Tests #321

Custom Prombench Tests #321

geekodour commented Feb 4, 2020

krasi-georgiev commented Feb 4, 2020 •

edited

Loading

cstyan commented Feb 4, 2020

krasi-georgiev commented Feb 4, 2020

cstyan commented Feb 4, 2020

geekodour commented Feb 4, 2020

geekodour commented Feb 4, 2020

geekodour commented Feb 4, 2020

cstyan commented Feb 4, 2020

geekodour commented Feb 4, 2020

krasi-georgiev commented Feb 4, 2020

geekodour commented Feb 5, 2020

krasi-georgiev commented Feb 5, 2020

geekodour commented Feb 5, 2020

geekodour commented Apr 30, 2020

Custom Prombench Tests #321

Custom Prombench Tests #321

Comments

geekodour commented Feb 4, 2020

krasi-georgiev commented Feb 4, 2020 • edited Loading

cstyan commented Feb 4, 2020

krasi-georgiev commented Feb 4, 2020

cstyan commented Feb 4, 2020

geekodour commented Feb 4, 2020

geekodour commented Feb 4, 2020

geekodour commented Feb 4, 2020

cstyan commented Feb 4, 2020

geekodour commented Feb 4, 2020

krasi-georgiev commented Feb 4, 2020

geekodour commented Feb 5, 2020

krasi-georgiev commented Feb 5, 2020

geekodour commented Feb 5, 2020

geekodour commented Apr 30, 2020

krasi-georgiev commented Feb 4, 2020 •

edited

Loading