-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset for automation of performance measurements #220
Comments
Fixtures (source code+tests) added to the recommended drivers. May I close this issue if I am not missing something? |
We might also need at least one sample of an extremely large source file for each language. This time it's not expected for those files to be the same across languages. We just need it to contain >1000 LOC, but with an actual code, not the large string constants or giant global dictionaries. It's hard to find files with an equal complexity, so I think we can just grab any random source from well-known projects (with a free license). Or find those files in the language runtimes or stdlibs. |
ok, I will try to find some programs, and yes most probably they will have to be in different languages as at least in rosetta I couldn't find one very large program for all the drivers. |
One potential source of large nontrivial files is the generated code from the protobuf compiler. They're not human-written, so some patterns may not be exercised—but if the goal is just to push on size that may be a simple way to get source files we can control. Java and C++ in particular generate incredibly noisy wrappers. Go is more moderate, but still large. |
I was tempted to propose protobuf wrappers as well :) But I haven't seen protobuf wrapper for dynamic languages like Python, Ruby and JS. I assume they may use dynamic language nature and solve most problems with reflection. The UAST of this kind of code will probably be relatively flat (just defining types and calling into the protobuf library), so it won't detect performance issues in the UAST transformation layer. If it's not the case, then indeed protobufs sounds like a faster way to collect those large files. |
It depends on the implementation—Python, for example, has both a "native" implementation and a shim around C++. (In practice, I think ~everyone who cares winds up using the latter for performance reasons). Nevertheless, I think your implicit point is basically right, which is that we wouldn't necessarily use protobuf generated code for all languages.
I'd be less concerned with the external dependency and more that it's a moving target. IMO the data for performance tests should generally not track active development, but should pick "interesting" targets and fix them at a particular state. Updating perf samples basically requires you to re-do all your historical measurements, or we lose comparability. So I'd recommend the approach of "freezing" each sample so that we only get changes when we want them (e.g., to exercise a new feature). |
Most probably be this should be a part of umbrella #212 and is about the input data for perf regression dashboard. |
I think we can close this since @tsolakoua resolved it some time ago by sending PRs to every driver with code samples from the Rosetta Code and other places. We can discuss further steps in #212. |
Will collect some LoC from RosetaCode
Related to #212
The text was updated successfully, but these errors were encountered: