You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
After seeing the big performance gains from #8441 I did a quick profile run of a subset of the tests and I found that even after the caching fix data generation is taking a majority of the time, when we are not waiting for Spark. And of this almost all of it is in sre_yield which is the tool we use to do string generation. (we also use it to generate strings that are formatted like proper decimal values). It would be good to see what we can do to speed up this string generation. Perhaps we can look at a good way to generate random strings without it if there is no regular expression passed in. We might also want to look at alternatives for generating decimal values, or very simple patterns.
The text was updated successfully, but these errors were encountered:
Can we do file caching for test data? I.e., a generator will first try to load from a cached file. If the file doesn't exist, data will be generated and written down to file.
Can we do file caching for test data? I.e., a generator will first try to load from a cached file. If the file doesn't exist, data will be generated and written down to file.
That would be another interesting option. I'm not sure how much it would help beyond the in memory caching for CI runs. It would help a lot for local runs where we can keep data cached for longer. We would want to make sure that we add some kind of a "version" number to the generators so that if we change them, then we can invalidate the data saved to disk.
Is your feature request related to a problem? Please describe.
After seeing the big performance gains from #8441 I did a quick profile run of a subset of the tests and I found that even after the caching fix data generation is taking a majority of the time, when we are not waiting for Spark. And of this almost all of it is in
sre_yield
which is the tool we use to do string generation. (we also use it to generate strings that are formatted like proper decimal values). It would be good to see what we can do to speed up this string generation. Perhaps we can look at a good way to generate random strings without it if there is no regular expression passed in. We might also want to look at alternatives for generating decimal values, or very simple patterns.The text was updated successfully, but these errors were encountered: