-
-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GSoC 2024 | Benchmark | First objective | Israel Roldan #2565
GSoC 2024 | Benchmark | First objective | Israel Roldan #2565
Conversation
Thank you for this extremely comprehensive pull request. It may take us some time to review this! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2565 +/- ##
=======================================
Coverage 67.18% 67.18%
=======================================
Files 173 173
Lines 14570 14570
=======================================
Hits 9789 9789
Misses 4781 4781 ☔ View full report in Codecov by Sentry. |
I removed the Dockerfiles in the commit 25c5dfd as it was requested. |
The visualization_convergence_plots benchmarks are empty in your screenshot, is that a bug? |
benchmarks/benchmark_base.py
Outdated
f"{self.example_configuration_dir}/tardis_configv1_verysimple.yml" | ||
) | ||
|
||
class RegressionData: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a partial duplicate of the existing RegressionData
class, why is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main problem in general is that the unit tests which are based on PyTest have some specific behavior (tricks). Some of these are the fixtures which can generate special functionality for the test class, function; other functionality is the ability to get the parameters sent in the argument. For this reason, most of the classes in PyTest were taken and modified.
I remember that I tried to use the tardis/tests/fixtures/regression_data.py
but it wasn't work.
The best way, is improving ASV to have the possibility to metric the PyTest rather than rework in new benchmark tests.
ASV discussion for summit 2022.
airspeed-velocity/asv#1219
Improving Python benchmarking tooling #1219
Created by @asmeurer opened this issue Oct 6, 2022 · 18 comments
@oscarbenjamin commented on Oct 18, 2022
This actually extends beyond benchmarking to unit tests as well. There is no real need for unit tests and benchmarks to be different things.
My personal comments:
- Other things I noticed when I migrated the unit tests with PyTest to ASV was that the effort and code were really duplicating. Basically, my benchmarks are a copy of PyTest adapted to ASV.
- Therefore, I saw more potential in ASV to create functionality benchmarks instead of unit tests.
- If ASV can be extended to benchmark unit tests in PyTest, then it would be incredible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very useful information. My intent with the unit test first objective was that our unit tests provide suggested input for functions that make benchmarking them easier for applicants without prior TARDIS knowledge. If this leads to extensive code duplication, then more "integrated" benchmarks may be preferable, that measure multiple functions at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the context here but since I was mentioned I will say that in general, benchmarks and tests are very similar in the way they work, so they can probably share some common frameworks. But I would caution against saying that they are the same thing in general. Tests and benchmarks have different goals and trying to squeeze both into one is a bad idea. A good test may be a bad benchmark and a good benchmark may be a bad test.
If anything, a benchmark is like a "test" against performance instead of functionality, but even then, that leads to different requirements. Just as an example, asv benchmarks should work against every historical version of the code, whereas a test only needs to run against the latest version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your insight, Aaron!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Aron!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel lost in these approaches because I don't know what the best setup that provides a useful result for the benchmark.
I was reviewing the article about Pandas regression using ASV benchmarking that @epassaro shared. Then with all of this information I have these comments:
I was thinking carefully and came to the conclusion that we can learn from those who have already walked the path. So I took on the task of searching the Pandas repository to see how they handle benchmarks. This is the Pandas repository regarding benchmarks versus unit testing for memory_usage, but I didn't find a pattern to follow. Apparently, the benchmarks are very specific and simple. Normally, they test a method with parameters; however, I didn't see many benchmarks with parameterization. Perhaps a good idea is to schedule a meeting with the Pandas maintainers to see if they will give us 30 minutes or an hour to give us a lecture on how to implement our benchmarks.
The second option is to apply all the approaches and learning along the way for months or years. The approaches could be all of these three together:
- Create benchmarks based on the unit tests.
- Create some specific unit tests for some specific methods.
- As you mentioned, Andrew, create benchmarks with flows of functionality (“integrated” benchmarks).
ASV benchmarking page: https://asv-runner.github.io/asv-collection/pandas/
benchmarks/benchmark_base.py
Outdated
return atomic_data_fname | ||
|
||
@property | ||
def tardis_model_config_nlte_root(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer similar properties to be grouped in the class definition order e.g. all of the tardis_model_config together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should be grouped in classes.
I noticed that PyTest has utility classes for atomic or other groups. I did in one class for time reasons and because I sent as a draft rather than a final pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I mean that the order of the properties in the class definition as written in the file would be better in a clear order. Right now they are not clearly ordered by usage or name. If it can be broken up into multiple classes, even better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you check my demo proposal in the commit: 3aa02eb?
If it is useful and there are no comments, I'll continue to refactor the benchmark_base.py
and all the benchmarks with this structure.
The benchmarks ran as expected:
· Creating environments 12:20:35 [8/8]
· Discovering benchmarks
· Running 12 total benchmarks (1 commits * 1 environments * 12 benchmarks)
[ 0.00%] · For tardis commit b7e07720 <master>:
[ 0.00%] ·· Benchmarking mamba-py3.12
[ 4.17%] ··· Running (io_configuration_config_reader.BenchmarkIoConfigurationConfigReader.time_config_hdf--).....
[25.00%] ··· Running (io_configuration_config_reader.BenchmarkIoConfigurationConfigReader.time_plasma_nlte_section_lu_config--).
[29.17%] ··· Running (io_configuration_config_reader.BenchmarkIoConfigurationConfigReader.time_plasma_nlte_section_root_config--).
[33.33%] ··· Running (io_configuration_config_reader.BenchmarkIoConfigurationConfigReader.time_plasma_section_config--).....
[54.17%] ··· io_configuration_config_reader.BenchmarkIoConfigurationConfigReader.time_config_hdf 69.4±2ms
[58.33%] ··· io_configuration_config_reader.BenchmarkIoConfigurationConfigReader.time_convergence_section_parser 1.27±0.02μs
[62.50%] ··· io_configuration_config_reader.BenchmarkIoConfigurationConfigReader.time_from_config_dict 58.2±1ms
[66.67%] ··· io_configuration_config_reader.BenchmarkIoConfigurationConfigReader.time_model_section_config 128±10ms
[70.83%] ··· io_configuration_config_reader.BenchmarkIoConfigurationConfigReader.time_plasma_nlte_root_exc_section_config 22.3±0.4s
[75.00%] ··· io_configuration_config_reader.BenchmarkIoConfigurationConfigReader.time_plasma_nlte_section_lu_config 21.7±0.1s
[79.17%] ··· io_configuration_config_reader.BenchmarkIoConfigurationConfigReader.time_plasma_nlte_section_root_config 21.4±0.01s
[83.33%] ··· io_configuration_config_reader.BenchmarkIoConfigurationConfigReader.time_plasma_section_config ok
[83.33%] ··· ================= ============
Plasma values
----------------- ------------
initial_t_inner 60.9±0.3ms
initial_t_rad 61.7±1ms
================= ============
[87.50%] ··· io_configuration_config_reader.BenchmarkIoConfigurationConfigReader.time_spectrum_section_config 64.3±1ms
[91.67%] ··· io_configuration_config_reader.BenchmarkIoConfigurationConfigReader.time_supernova_section_config 131±2ms
[95.83%] ··· plasma_nlte_excitation_x.BenchmarkPlasmaNlteExcitation.time_coll_exc_deexc_matrix ok
[95.83%] ··· ============================================================================================================================== =============
Matrix
------------------------------------------------------------------------------------------------------------------------------ -------------
{'coll_exc_coeff_values': [1, -2, 3], 'coll_deexc_coeff_values': [4, 9, 10], 'number_of_levels': 3} 1.63±0.09ms
{'coll_exc_coeff_values': [0.21, 0.045, 0.1234], 'coll_deexc_coeff_values': [0.7865, 0.987, 0.00123], 'number_of_levels': 3} 1.57±0.04ms
============================================================================================================================== =============
[100.00%] ··· plasma_nlte_excitation_x.BenchmarkPlasmaNlteExcitation.time_prepare_bound_bound_rate_matrix 140±2ms
real 5m32.771s
user 5m40.852s
sys 1m52.319s
return self.RegressionData() | ||
|
||
@property | ||
def packet(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and similar properties duplicate a lot of pytest fixtures. Could they reference them instead? This could become unmaintainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember that I tried to reuse them, but it was not successful for some reason that I don't remember. We can try again with more context and knowledge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my best proposal for the re-utilization Regression Data class in PyTest: ab98e61.
The big problem is that PyTest when runs, it got many values and information, and it is passed and used in the Fixtures. It is challenging to get these values running ASV, we need to create fake objects and hard-code some values.
For this demo, I ran ASV with asv run --bench "time_prepare_bound_bound_rate_matrix" --bench "time_coll_exc_deexc_matrix"
. It works good, basically took the regression data, it did some processes, and compared the results for the process and the regression data.
The output was:
· Creating environments
· Discovering benchmarks
· Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks)
[ 0.00%] · For tardis commit b7e07720 <master>:
[ 0.00%] ·· Benchmarking mamba-py3.12
[25.00%] ··· Running (plasma_nlte_excitation_x.BenchmarkPlasmaNlteExcitation.time_coll_exc_deexc_matrix--)..
[75.00%] ··· plasma_nlte_excitation_x.BenchmarkPlasmaNlteExcitation.time_coll_exc_deexc_matrix ok
[75.00%] ··· ============================================================================================================================== =============
Matrix
------------------------------------------------------------------------------------------------------------------------------ -------------
{'coll_exc_coeff_values': [1, -2, 3], 'coll_deexc_coeff_values': [4, 9, 10], 'number_of_levels': 3} 1.60±0.04ms
{'coll_exc_coeff_values': [0.21, 0.045, 0.1234], 'coll_deexc_coeff_values': [0.7865, 0.987, 0.00123], 'number_of_levels': 3} 1.63±0.04ms
============================================================================================================================== =============
[100.00%] ··· plasma_nlte_excitation_x.BenchmarkPlasmaNlteExcitation.time_prepare_bound_bound_rate_matrix 140±1ms
What are the main changes?
In the file benchmarks/benchmark_base.py
, I created an inner class CustomPyTestRequest
which is the replacement for the request
variable for the function def regression_data(request)
in the file tardis/tests/fixtures/regression_data.py
.
In the base benchmark, I created an instance of the class CustomPyTestRequest
and passed the parameters for module name, node name, regression data directory and path.
When some benchmark has parameters (@parameterize
), it is more dirty because it needs to specify what benchmark is running between parameters. In the demo, I create a condition to detect the ID of the parameter:
@parameterize({'Matrix': [
{
"coll_exc_coeff_values": [1, -2, 3],
"coll_deexc_coeff_values": [4, 9, 10],
"number_of_levels": 3,
},
{
"coll_exc_coeff_values": [0.21, 0.045, 0.1234],
"coll_deexc_coeff_values": [0.7865, 0.987, 0.00123],
"number_of_levels": 3,
},
]})
def time_coll_exc_deexc_matrix(self, matrix):
coll_exc_coeff_values = matrix["coll_exc_coeff_values"]
coll_deexc_coeff_values = matrix["coll_deexc_coeff_values"]
number_of_levels = matrix["number_of_levels"]
test_parameter_id = 0
if coll_exc_coeff_values != [1, -2, 3]:
test_parameter_id = 1
Then, replace in the node name:
custom_request = self.CustomPyTestRequest(
f"test_coll_exc_deexc_matrix__coll_exc_coeff_values{test_parameter_id}-coll_deexc_coeff_values{test_parameter_id}-3__",
…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put this comment into a github discussion to preserve it?
No, usually ASV maintains the test names. If I don't delete all the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I respond all the comments for the review.
return self.RegressionData() | ||
|
||
@property | ||
def packet(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember that I tried to reuse them, but it was not successful for some reason that I don't remember. We can try again with more context and knowledge?
6e0fd31
to
27dddf9
Compare
27dddf9
to
44287de
Compare
This reverts commit abe3b0d.
Looks like the PR is still failing codestyle. Test failures are less of a concern because of the reference data changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apparently finalized all the comments and suggestions in the review. If I missed anything or if more work is needed in the verification, feel free to let me know, and I will be happy to make the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the hard work on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove all the benchmarks except montecarlo_numba
and run_tardis
. Remove any files that are commented out.
Done in the commit: c32a7bf. Questions:
Updates:
|
r_packet_transport.move_packet_across_shell_boundary( | ||
packet, delta_shell, no_of_shells | ||
) | ||
assert packet.status == r_packet.PacketStatus.EMITTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want assert statements?
📝 Description
Type: 🚀
feature
Hi, @andrewfullard, @atharva-2001, and @aoifeboyle.
I added my pull request for my work in the application for GSoC 2024, it is for the benchmark project. Thanks in advance.
GitHub Action to store more than 5 commits of benchmarks.
The first goal is: “You should improve the GitHub Action to store more than just the recent 5 commits of benchmarks to provide an extended history”.
The solution to this problem is in my application that I will share with the mentors. Trying to do the first goal I had some issues, then I decided to create a development environment.
Development environment
Commit: edb0e8f
I tried to install all the requirements, and I noticed that it is quite of difficult and some packages were installed as Conda. I would rather not mix my personal environment with the TARDIS development environment. For this reason, I created a series of Docker processes to have 3 types of environment: one for static code in the container, another fragmented (mixed) and the total live which means use your local computer (host) to take and store files.
To see more references, check the file:
Dockerfile-benchmarks.bash
. Furthermore, one video is better thatn ten thousand of words; I created a YouTube video explaining this development environment usage.Docker
Static
Pros:
Cons:
Fragmented
Pros:
Cons:
Live
Pros:
Cons:
Expand the framework and create benchmarks.
Commit: 65ae093
Results.
The results are on my application but I almost create the 56% of the unit tests as a time benchmark. Furthermore, in my application listed the challenges to migrate all the unit test to the benchmarks.
TARDIS Benchmarking and Performance Improvement.
I run all the benchmarks selecting the releases tags, then run some specific benchmark to find bottlenecks. At the end the idea is check if ASV is useful for detect performance issues. Spolier alert: I am not pretty sure that ASV detect them.
I crate and upload the result of my benchmarks in this web links:
📌 Resources
N/A.
🚦 Testing
How did you test these changes?
☑️ Checklist
build_docs
label