Automation of performance measurements #212

bzz · 2018-11-20T11:07:10Z

This is an umbrella issue for initial work on automating performance analysis/regression suite for bblfshd, to build a baseline benchmark.

Motivation (things reported to be slow):

Slow driver (compared with the Go one) java-driver#96
https://github.com/src-d/empathy-sessions/issues/15 (one before last)

TODOs:

small dataset of some LoC for each recommended driver (1 same program from RosetaCode?) Dataset for automation of performance measurements #220
UAST parsing test suite (to run across gRPC: bblfshd/individual driver, STDIO: native parser)
UAST filtering test suite (rudimentary, 1 query)
OpenTrace instrumentation of
- client-go
- bblfshd
- drivers
performance regression suite running on Jenkins

Each of the items above is expected to be taken care of as a separate Issue/PRs (by different authors).

As this is initial round of work on performance, there are no expectations on completeness of the test cases - it's rather important to have all prices in place and infrastructure up and going.

The text was updated successfully, but these errors were encountered:

tsolakoua · 2018-11-23T09:30:14Z

I would like to focus on the small dataset of some LoC for each recommended driver for this Monday's OSD. I will create a separate issue on that and can be assigned to me.

bzz · 2018-11-27T16:22:15Z

For context - UAST perf measurements on gitbase side src-d/gitbase#606 hit #209.

Would be nice to try to generate at least similar load in our baseline and see how much it can be stretched from there.

dennwc · 2018-12-03T19:59:21Z

The new SDK (v2.12.0+) will generate a benchmark report (bench.txt) during bblfsh-sdk test -b.

I will now update all drivers that include benchmark fixtures (many thanks to @tsolakoua!).

It won't be enabled in CI for obvious reasons (shared instances), so we still need some infrastructure to run it.

tsolakoua · 2018-12-03T22:30:54Z

Next Monday is OSD and I could continue on that since I finished with the benchmark fixtures. However, I don't understand well the next steps so I might need some support to get started.

bzz · 2018-12-04T12:22:02Z

It won't be enabled in CI for obvious reasons (shared instances), so we still need some infrastructure to run it.

\cc @smola as AFAIK he was working on some Jenkins setup

smola · 2018-12-05T08:43:52Z

Watch https://github.com/src-d/backlog/issues/1307
We will have a Jenkins instance with a bare metal server dedicated to performance tests. It will be ready soon.

It will be guided by Jenkinsfile (see docs). I'll provide some example that works with our setup.

smola · 2018-12-07T12:26:54Z

We already have the Jenkins deployment, soon you'll have the borges pipeline as an example for you to develop your own.

bzz · 2018-12-12T16:55:55Z

Linking in some instructions on using Jenkins for perf testing https://src-d.slack.com/archives/C0J8VQU0K/p1544633659068100

dennwc · 2019-07-01T17:13:30Z

@lwsanty will continue to work on this, as discussed on Slack.

Specifically, we have a set of Go benchmarks in each driver which can be run using go test -run=NONE -bench=. ./driver/.... These benchmarks don't need a compiled driver, only the Go source and the data in ./fixtures directory. This benchmark only profiles the driver's Native AST -> Semantic UAST transformation pipeline, not the driver itself. We also have a tool to benchmark a fully compiled driver as well (parsing + protocol overhead + transformation) but it may be harder to setup at first.

I think it might be a good first step to setup our Jenkins instance to run these Go benchmarks for each driver either every few days or on each commit to the driver's master branch. Later we can expand it by pulling/building a Docker image, benchmarking it with and without bblfshd, etc. But for now, having performance stats for UAST transforms is super useful on its own.

lwsanty · 2019-07-02T07:26:54Z

According to prev comment. I propose to achieve this in the same way as it was done in borges, regression-borges
Things that need to be done:

create a separate repo in bblfsh, we can name it performance-driver, there's gonna be a utility and container built for further running benchmarks, parsing the output and propagating the results to some metrics services(prometheus/influx + grafana), that run in k8s.
blockers
- need to grant the access to create this repo or request it's creation
- need to grant the access to srcd docker registry
need to make a request to infra team to launch metrics services in k8s
(optional) configure another methods of notification via slack/email
blockers

need to grant admin access to Jenkins for me, I've already made the request

need slack token

need slack channel

need some service email

@smola @dennwc @bzz
It would be cool to have a feedback on this proposition.

bzz · 2019-07-02T07:59:19Z

Overall looks good!

blockers

JFY repository creation, as well as other ACL bits are handled by Infra where appropriate issues has to be filed, as soon as there is a consensus.

Before doing that, shall we briefly discuss what kind of performance regression dashboard do we want to have at the end?

E.g from the proposal on repository naming above, I figure that we are talking about individual driver "internal" performance benchmark.

I think it would be really useful is to include next things in the same dashboard:

individual driver benchmarks test results (no need for actual full driver running, go test -bench=.)
from the repository name proposal above, I presume that initial implementation targets this
each driver performance under some pre-defined workload (though gRPC, only a driver conturing running running)
bblfshd performance under sam pre-defined workload (gRPC, whole bblfshd)
bblfshd performance under same workload, run though different clients (breakdown by client):

May be this would require turning current issue into and ☂️ and handle each of those individually though a new smaller issues in the order of priority.

I belive this way, all these may live in a same repository e.g bblfsh/performance, would be re-run by Jenkins on every release of bblfshd (manually triggered by tag name?) and should provide us (maintainers) with an accurate picture of expected performance and any possible regression.

Last but not least - for me, notifications are much less of the priority, comparing to having such a "dashboard".

Given the requirements above, I'm not sure how much of the regression-borges can be productively reused - afaik it consumes a single binary but in our case individual drivers do not have binary release artifacts and we would need to start containers instead (in some cases).

Also afaik regression-borges is mainly focused on output CSV with comparison between N version of the same binary, and in our case it could be more about populating some dashboard (graphan+ES?) with the metrics from different tools.

And for 2-4, I'm not 100% sure but thinks we might be able to re-use some of the prior work e.g:

@dennwc @creachadair WDYT? BTW, may be it will be productive to schedulle a quick call about this at some point.

dennwc · 2019-07-02T12:38:31Z

👍 for scheduling a call.

bblfsh/performance sounds like a good name.

Agree about the notifications - they are not that important. The MVP for me is a dashboard with go test -bench=. benchmarks for each driver, even without gRPC/bblfshd. We can't really optimize native parsing and we can't change the protocol lightly to reduce overhead. The only actionable item is the optimization of UAST transforms or DSL, which will be monitored by the mentioned Go benchmark. And clients, of course, but that's out of the scope of MVP :)

For the dashboard itself, I'm not sure what is considered a "standard" right now, but I definitely don't want Jenkins dashboards - those are static and ugly. I also propose to use a pair of Grafana + Influx/ES/whatever if there are no better options. Grafana also provides "alarms" so we can setup notification later (if needed).

Re "single dashboard from multiple tools", as @lwsanty mentioned, we may need to consult with Infra team to know if we can reuse our Grafana instance in the pipeline cluster. We may need a separate one because of the isolation between clusters.

tsolakoua mentioned this issue Nov 23, 2018

Dataset for automation of performance measurements #220

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automation of performance measurements #212

Automation of performance measurements #212

bzz commented Nov 20, 2018 •

edited by dennwc

Loading

tsolakoua commented Nov 23, 2018

bzz commented Nov 27, 2018

dennwc commented Dec 3, 2018

tsolakoua commented Dec 3, 2018

bzz commented Dec 4, 2018

smola commented Dec 5, 2018 •

edited

Loading

smola commented Dec 7, 2018

bzz commented Dec 12, 2018

dennwc commented Jul 1, 2019

lwsanty commented Jul 2, 2019 •

edited by dennwc

Loading

bzz commented Jul 2, 2019 •

edited

Loading

dennwc commented Jul 2, 2019

Automation of performance measurements #212

Automation of performance measurements #212

Comments

bzz commented Nov 20, 2018 • edited by dennwc Loading

tsolakoua commented Nov 23, 2018

bzz commented Nov 27, 2018

dennwc commented Dec 3, 2018

tsolakoua commented Dec 3, 2018

bzz commented Dec 4, 2018

smola commented Dec 5, 2018 • edited Loading

smola commented Dec 7, 2018

bzz commented Dec 12, 2018

dennwc commented Jul 1, 2019

lwsanty commented Jul 2, 2019 • edited by dennwc Loading

bzz commented Jul 2, 2019 • edited Loading

dennwc commented Jul 2, 2019

bzz commented Nov 20, 2018 •

edited by dennwc

Loading

smola commented Dec 5, 2018 •

edited

Loading

lwsanty commented Jul 2, 2019 •

edited by dennwc

Loading

bzz commented Jul 2, 2019 •

edited

Loading