Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset for automation of performance measurements #220

Closed
tsolakoua opened this issue Nov 23, 2018 · 9 comments
Closed

Dataset for automation of performance measurements #220

tsolakoua opened this issue Nov 23, 2018 · 9 comments

Comments

@tsolakoua
Copy link

tsolakoua commented Nov 23, 2018

Will collect some LoC from RosetaCode
Related to #212

@tsolakoua
Copy link
Author

Fixtures (source code+tests) added to the recommended drivers. May I close this issue if I am not missing something?

@dennwc
Copy link
Member

dennwc commented Dec 3, 2018

We might also need at least one sample of an extremely large source file for each language.

This time it's not expected for those files to be the same across languages. We just need it to contain >1000 LOC, but with an actual code, not the large string constants or giant global dictionaries.

It's hard to find files with an equal complexity, so I think we can just grab any random source from well-known projects (with a free license). Or find those files in the language runtimes or stdlibs.

@tsolakoua
Copy link
Author

ok, I will try to find some programs, and yes most probably they will have to be in different languages as at least in rosetta I couldn't find one very large program for all the drivers.

@creachadair
Copy link
Contributor

One potential source of large nontrivial files is the generated code from the protobuf compiler. They're not human-written, so some patterns may not be exercised—but if the goal is just to push on size that may be a simple way to get source files we can control.

Java and C++ in particular generate incredibly noisy wrappers. Go is more moderate, but still large.

@dennwc
Copy link
Member

dennwc commented Dec 4, 2018

I was tempted to propose protobuf wrappers as well :)

But I haven't seen protobuf wrapper for dynamic languages like Python, Ruby and JS. I assume they may use dynamic language nature and solve most problems with reflection. The UAST of this kind of code will probably be relatively flat (just defining types and calling into the protobuf library), so it won't detect performance issues in the UAST transformation layer.

If it's not the case, then indeed protobufs sounds like a faster way to collect those large files.

@tsolakoua
Copy link
Author

@dennwc would something like that work or no because it has external dependencies?

@creachadair
Copy link
Contributor

I haven't seen protobuf wrapper for dynamic languages like Python, Ruby and JS. I assume they may use dynamic language nature and solve most problems with reflection.

It depends on the implementation—Python, for example, has both a "native" implementation and a shim around C++. (In practice, I think ~everyone who cares winds up using the latter for performance reasons). Nevertheless, I think your implicit point is basically right, which is that we wouldn't necessarily use protobuf generated code for all languages.

[W]ould something like that work or no because it has external dependencies?

I'd be less concerned with the external dependency and more that it's a moving target.

IMO the data for performance tests should generally not track active development, but should pick "interesting" targets and fix them at a particular state. Updating perf samples basically requires you to re-do all your historical measurements, or we lose comparability.

So I'd recommend the approach of "freezing" each sample so that we only get changes when we want them (e.g., to exercise a new feature).

@bzz
Copy link
Contributor

bzz commented Jul 8, 2019

Most probably be this should be a part of umbrella #212 and is about the input data for perf regression dashboard.

@dennwc
Copy link
Member

dennwc commented Jul 8, 2019

I think we can close this since @tsolakoua resolved it some time ago by sending PRs to every driver with code samples from the Rosetta Code and other places. We can discuss further steps in #212.

@dennwc dennwc closed this as completed Jul 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants