Dataset for automation of performance measurements #220

tsolakoua · 2018-11-23T09:51:01Z

Will collect some LoC from RosetaCode
Related to #212

tsolakoua · 2018-12-03T22:36:23Z

Fixtures (source code+tests) added to the recommended drivers. May I close this issue if I am not missing something?

dennwc · 2018-12-03T22:50:07Z

We might also need at least one sample of an extremely large source file for each language.

This time it's not expected for those files to be the same across languages. We just need it to contain >1000 LOC, but with an actual code, not the large string constants or giant global dictionaries.

It's hard to find files with an equal complexity, so I think we can just grab any random source from well-known projects (with a free license). Or find those files in the language runtimes or stdlibs.

tsolakoua · 2018-12-03T22:54:26Z

ok, I will try to find some programs, and yes most probably they will have to be in different languages as at least in rosetta I couldn't find one very large program for all the drivers.

creachadair · 2018-12-04T00:36:38Z

One potential source of large nontrivial files is the generated code from the protobuf compiler. They're not human-written, so some patterns may not be exercised—but if the goal is just to push on size that may be a simple way to get source files we can control.

Java and C++ in particular generate incredibly noisy wrappers. Go is more moderate, but still large.

dennwc · 2018-12-04T00:50:05Z

I was tempted to propose protobuf wrappers as well :)

But I haven't seen protobuf wrapper for dynamic languages like Python, Ruby and JS. I assume they may use dynamic language nature and solve most problems with reflection. The UAST of this kind of code will probably be relatively flat (just defining types and calling into the protobuf library), so it won't detect performance issues in the UAST transformation layer.

If it's not the case, then indeed protobufs sounds like a faster way to collect those large files.

tsolakoua · 2018-12-10T09:22:31Z

@dennwc would something like that work or no because it has external dependencies?

creachadair · 2018-12-10T16:44:05Z

I haven't seen protobuf wrapper for dynamic languages like Python, Ruby and JS. I assume they may use dynamic language nature and solve most problems with reflection.

It depends on the implementation—Python, for example, has both a "native" implementation and a shim around C++. (In practice, I think ~everyone who cares winds up using the latter for performance reasons). Nevertheless, I think your implicit point is basically right, which is that we wouldn't necessarily use protobuf generated code for all languages.

[W]ould something like that work or no because it has external dependencies?

I'd be less concerned with the external dependency and more that it's a moving target.

IMO the data for performance tests should generally not track active development, but should pick "interesting" targets and fix them at a particular state. Updating perf samples basically requires you to re-do all your historical measurements, or we lose comparability.

So I'd recommend the approach of "freezing" each sample so that we only get changes when we want them (e.g., to exercise a new feature).

bzz · 2019-07-08T13:14:48Z

Most probably be this should be a part of umbrella #212 and is about the input data for perf regression dashboard.

dennwc · 2019-07-08T13:27:24Z

I think we can close this since @tsolakoua resolved it some time ago by sending PRs to every driver with code samples from the Rosetta Code and other places. We can discuss further steps in #212.

bzz assigned tsolakoua Nov 23, 2018

bzz mentioned this issue Nov 23, 2018

Automation of performance measurements #212

Open

5 tasks

juanjux added the enhancement label Nov 23, 2018

This was referenced Nov 26, 2018

Add program fixtures for benchmark bblfsh/go-driver#33

Merged

Add program to fixtures for performance measure bblfsh/cpp-driver#13

Merged

bzz unassigned tsolakoua Jul 8, 2019

dennwc closed this as completed Jul 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset for automation of performance measurements #220

Dataset for automation of performance measurements #220

tsolakoua commented Nov 23, 2018 •

edited

Loading

tsolakoua commented Dec 3, 2018

dennwc commented Dec 3, 2018

tsolakoua commented Dec 3, 2018

creachadair commented Dec 4, 2018

dennwc commented Dec 4, 2018

tsolakoua commented Dec 10, 2018

creachadair commented Dec 10, 2018

bzz commented Jul 8, 2019

dennwc commented Jul 8, 2019

Dataset for automation of performance measurements #220

Dataset for automation of performance measurements #220

Comments

tsolakoua commented Nov 23, 2018 • edited Loading

tsolakoua commented Dec 3, 2018

dennwc commented Dec 3, 2018

tsolakoua commented Dec 3, 2018

creachadair commented Dec 4, 2018

dennwc commented Dec 4, 2018

tsolakoua commented Dec 10, 2018

creachadair commented Dec 10, 2018

bzz commented Jul 8, 2019

dennwc commented Jul 8, 2019

tsolakoua commented Nov 23, 2018 •

edited

Loading