Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First shot at an appveyor configuration #195

Merged
merged 3 commits into from
Apr 4, 2018
Merged

First shot at an appveyor configuration #195

merged 3 commits into from
Apr 4, 2018

Conversation

PragTob
Copy link
Member

@PragTob PragTob commented Mar 12, 2018

Fixes #79

@PragTob
Copy link
Member Author

PragTob commented Mar 12, 2018

It builds 🎉 🎉 🎉

But apparently it's super slow and hence a bunch of tests fail 😢

https://ci.appveyor.com/project/PragTob/benchee/build/build19

............................................................................................................................................................................................
138
139
  1) test .run_scenarios doesn't take longer than advertised for very fast funs (Benchee.Benchmark.RunnerTest)
140
     test/benchee/benchmark/runner_test.exs:162
141
     excution took too long 47000 vs. 30000 +- 1.2e4
142
     code: retrying(fn ->
143
     stacktrace:
144
       test/benchee/benchmark/runner_test.exs:163: (test)
145
146
..................................................................
147
148
  2) test .run_scenarios runs a benchmark suite and enriches it with measurements (Benchee.Benchmark.RunnerTest)
149
     test/benchee/benchmark/runner_test.exs:43
150
     Assertion with >= failed
151
     code:  assert length(run_times) >= 5
152
     left:  4
153
     right: 5
154
     stacktrace:
155
       test/benchee/benchmark/runner_test.exs:44: (test)
156
157
..........
158
159
  3) test .run_scenarios each triggers for every invocation, scenario once (Benchee.Benchmark.RunnerTest)
160
     test/benchee/benchmark/runner_test.exs:587
161
     Assertion with >= failed
162
     code:  assert hook_call_count >= 2
163
     left:  1
164
     right: 2
165
     stacktrace:
166
       test/benchee/benchmark/runner_test.exs:656: (test)
167
168
..........
169
170
  4) test formatters have full access to the suite data, values in assigns (BencheeTest)
171
     test/benchee_test.exs:231
172
     Assertion with =~ failed
173
     code:  assert output =~ ~r"Run time: #{@rough_10_milli_s}$"m
174
     left:  "Operating System: Windows\nCPU Information: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz\nNumber of Available Cores: 2\nAvailable memory: 1.46 GB\nElixir 1.6.3\nErlang 20.2\nBenchmark suite executing with the following configuration:\nwarmup: 5 ms\ntime: 10 ms\nparallel: 1\ninputs: none specified\nEstimated total run time: 15 ms\n\n\nBenchmarking Sleeps...\nRun time: 16 ms\nAverage: 16 ms\nCustom value\n"
175
     right: ~r/Run time: ((8|9|10|11|12|13|14)\.\d{2} ms)$/m
176
     stacktrace:
177
       test/benchee_test.exs:261: anonymous fn/0 in BencheeTest."test formatters have full access to the suite data, values in assigns"/1
178
       test/benchee_test.exs:232: (test)
179
180
181
182
  5) test formatters can be supplied as the output/1 function (BencheeTest)
183
     test/benchee_test.exs:194
184
     Expected false or nil, got true
185
     code: refute Regex.match?(~r"fast"i, output)
186
     stacktrace:
187
       test/benchee_test.exs:506: BencheeTest.readme_sample_asserts/2
188
       test/benchee_test.exs:208: (test)
189
190
191
192
  6) test integration expanded README example (BencheeTest)
193
     test/benchee_test.exs:99
194
     Expected false or nil, got true
195
     code: refute Regex.match?(~r"fast"i, output)
196
     stacktrace:
197
       test/benchee_test.exs:506: BencheeTest.readme_sample_asserts/2
198
       test/benchee_test.exs:116: (test)
199
200
...
201
202
  7) test erlang style :benchee integration (BencheeTest)
203
     test/benchee_test.exs:84
204
     Expected false or nil, got true
205
     code: refute Regex.match?(~r"fast"i, output)
206
     stacktrace:
207
       test/benchee_test.exs:506: BencheeTest.readme_sample_asserts/2
208
       test/benchee_test.exs:96: (test)
209
210
.
211
212
  8) test formatters can be supplied as just the module name (BencheeTest)
213
     test/benchee_test.exs:177
214
     Expected false or nil, got true
215
     code: refute Regex.match?(~r"fast"i, output)
216
     stacktrace:
217
       test/benchee_test.exs:506: BencheeTest.readme_sample_asserts/2
218
       test/benchee_test.exs:191: (test)
219
220
.
221
222
  9) test integration keywordlist as options in second place (BencheeTest)
223
     test/benchee_test.exs:69
224
     Expected false or nil, got true
225
     code: refute Regex.match?(~r"fast"i, output)
226
     stacktrace:
227
       test/benchee_test.exs:506: BencheeTest.readme_sample_asserts/2
228
       test/benchee_test.exs:81: (test)
229
230
.
231
232
 10) test .run_scenarios populates results for all inputs (Benchee.Benchmark.RunnerTest)
233
     test/benchee/benchmark/runner_test.exs:265
234
     Assertion with >= failed
235
     code:  assert length(run_times_for(new_suite, "sleep", "Short wait")) >= 8
236
     left:  7
237
     right: 8
238
     stacktrace:
239
       test/benchee/benchmark/runner_test.exs:281: anonymous fn/0 in Benchee.Benchmark.RunnerTest."test .run_scenarios populates results for all inputs"/1
240
       test/benchee/benchmark/runner_test.exs:266: (test)
241
242
.....
243
244
 11) test inputs feature version of readme example (BencheeTest)
245
     test/benchee_test.exs:267
246
     Expected false or nil, got true
247
     code: refute Regex.match?(~r"fast"i, output)
248
     stacktrace:
249
       test/benchee_test.exs:506: BencheeTest.readme_sample_asserts/2
250
       test/benchee_test.exs:281: (test)
251
252
...
253
254
 12) test integration high level README example old school map config (BencheeTest)
255
     test/benchee_test.exs:53
256
     Expected false or nil, got true
257
     code: refute Regex.match?(~r"fast"i, output)
258
     stacktrace:
259
       test/benchee_test.exs:506: BencheeTest.readme_sample_asserts/2
260
       test/benchee_test.exs:66: (test)
261
262
...
263
264
 13) test .run_scenarios runs a suite with multiple jobs and gathers results (Benchee.Benchmark.RunnerTest)
265
     test/benchee/benchmark/runner_test.exs:60
266
     Assertion with >= failed
267
     code:  assert length(run_times_for(new_suite, "Name 2")) >= 8
268
     left:  7
269
     right: 8
270
     stacktrace:
271
       test/benchee/benchmark/runner_test.exs:61: (test)
272
273
..................
274
275
Finished in 10.1 seconds
276
106 doctests, 217 tests, 13 failures

For reference, even elixir 1.4 with erlang 18.3 on travis took 3.8 seconds...

@PragTob
Copy link
Member Author

PragTob commented Mar 12, 2018

So that leaves us at an every dangerous crossroad:

  • make our performance tests even more lenient and/or retrying
  • tag performance sensitive tests not to run on Windows CI and declare them good enough on Linux?

There's also some other stuff which seems to fail that's not entirely related to fast/slow - will have to look another day though :)

@devonestes
Copy link
Collaborator

This is great! Well, great-ish.

I'm only seeing failures related to speed things. Like, either the expected counts for run times don't match, or it's not printing the fast function warning, which we're expecting to see in the captured output.

As to a solution, I don't think we should make the existing performance tests more lenient. I think the two best options are:

  1. Excluding those performance tests from AppVeyor and just ignoring performance specifically for Windows, or
  2. Keeping a separate set of performance tests for AppVeyor with different expectations.

Personally, I think the cases where we would make a change that doesn't fail one of these performance tests on Linux but would make benchee noticeably slower on Windows are so rare that we don't need to worry about it. Frankly, I don't think it would ever happen, but I can't say that for sure.

That's why I'd lean towards option 1, tagging these tests to be skipped on Windows machines.

@PragTob
Copy link
Member Author

PragTob commented Mar 13, 2018

the regex thing still has me puzzled... it refutes that fast occurs in the text. But it does occur... so are they unbeknowingly super fast or does "fast" just occur somewhere else? To the debug we go!

@PragTob
Copy link
Member Author

PragTob commented Mar 13, 2018

Omg... we actually print the fast warning... but WHY? It's slow in the measurement... it's weird:

3) test formatters can be supplied as just the module name (BencheeTest)
164
     test/benchee_test.exs:177
165
     Refute with =~ failed
166
     code:  refute output =~ ~r"fast"i
167
     left:  "Operating System: Windows\nCPU Information: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz\nNumber of Available Cores: 2\nAvailable memory: 1.46 GB\nElixir 1.6.3\nErlang 20.2\nBenchmark suite executing with the following configuration:\nwarmup: 5 ms\ntime: 10 ms\nparallel: 1\ninputs: none specified\nEstimated total run time: 30 ms\n\n\nBenchmarking flat_map...\nWarning: The function you are trying to benchmark is super fast, making measures more unreliable! See: https://github.com/PragTob/benchee/wiki/Benchee-Warnings#fast-execution-warning\n\nYou may disable this warning by passing print: [fast_warning: false] as configuration options.\n\nBenchmarking map.flatten...\n\nName                  ips        average  deviation         median         99th %\nflat_map              800        1.25 ms     ±0.00%        1.25 ms        1.25 ms\nmap.flatten         62.50          16 ms     ±0.00%          16 ms          16 ms\n\nComparison: \nflat_map              800\nmap.flatten         62.50 - 12.80x slower\n"
168
     right: ~r/fast/i
169
     stacktrace:
170
       test/benchee_test.exs:506: BencheeTest.readme_sample_asserts/2
171
       test/benchee_test.exs:191: (test)
172
173
..
174

(I mean look at the average it's milliseconds)

I have no idea? Hopeful insight from @OvermindDL1 ?

@OvermindDL1
Copy link

Is it printed to stderr instead of stdout or so? (/me has not looked yet)

@PragTob
Copy link
Member Author

PragTob commented Mar 19, 2018

@OvermindDL1 problem is that we print the warning at all, which shouldn't happen at all as the benchmark isn't that fast - I vaguely remembered that there was something that the native time unit might change but couldn't find it again. Also :timer.tc's doc says it returns it in microseconds so I don't see how our usually slower appveyor should be that much faster all of a sudden

@PragTob
Copy link
Member Author

PragTob commented Mar 28, 2018

Good news! I'm able to relatively reliably reproduce all/most of this errors - my last test run also had 13 errors and also the very weird "fast" errors. Also the errors about being too slow. Mind you this is the exact same machine I often develop on with linux. So either elixir/erlang performance on Windows is that bad or it's Windows itself ;P

Funnily enough, I also get the fast warning when I just run mix run samples/run.exs so something is definitely fishy here:

Operating System: Windows
CPU Information: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Number of Available Cores: 8
Available memory: 15.94 GB
Elixir 1.6.4
Erlang 20.2
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
parallel: 1
inputs: none specified
Estimated total run time: 24 s


Benchmarking flat_map...
Warning: The function you are trying to benchmark is super fast, making measurements more unreliable!
This holds especially true for memory measurements.
See: https://github.com/PragTob/benchee/wiki/Benchee-Warnings#fast-execution-warning

You may disable this warning by passing print: [fast_warning: false] as configuration options.

Benchmarking map.flatten...
Warning: The function you are trying to benchmark is super fast, making measurements more unreliable!
This holds especially true for memory measurements.
See: https://github.com/PragTob/benchee/wiki/Benchee-Warnings#fast-execution-warning

You may disable this warning by passing print: [fast_warning: false] as configuration options.


Name                  ips        average  deviation         median         99th %
flat_map           1.25 K        0.80 ms     ±6.33%        0.78 ms        0.94 ms
map.flatten        0.81 K        1.23 ms     ±4.00%        1.25 ms        1.25 ms

Comparison:
flat_map           1.25 K
map.flatten        0.81 K - 1.54x slower

(seems to be about half the speed of what it does on Linux)

@PragTob
Copy link
Member Author

PragTob commented Mar 28, 2018

Grumble mumble windows grumble mumble.... TLDR; the code is right

19> Data1000 = lists:seq(1, 1000).
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
 23,24,25,26,27,28,29|...]
20> Data10000 = lists:seq(1, 10000).
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
 23,24,25,26,27,28,29|...]
21> Data100000 = lists:seq(1, 100000).
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
 23,24,25,26,27,28,29|...]
22> timer:tc(lists, map, [Succ, Data1000]).
{0,
 [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,
  24,25,26,27,28|...]}
23> timer:tc(lists, map, [Succ, Data10000]).
{15000,
 [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,
  24,25,26,27,28|...]}
24> timer:tc(lists, map, [Succ, Data100000]).
{77999,
 [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,
  24,25,26,27,28|...]}

Seems like the native time unit in Erlang on Windows isn't Microseconds but Milliseconds. That's why it doesn't register.

There's something on time correction in the erlang docs - but before we all go mumbo jumbo - timer:tc/1 already does the right thing:

tc(F) ->
    T1 = erlang:monotonic_time(),
    Val = F(),
    T2 = erlang:monotonic_time(),
    Time = erlang:convert_time_unit(T2 - T1, native, microsecond),
{Time, Val}.

source

--> we either gotta run more juicy benchmarks or also exclude these or those specific checks on Windows. Sigh.

Side mark, it's also documented in the elixir System docs where I feel it should be mentioned that you can also just use timer.tc - I should send a patch and see what they say.

@devonestes
Copy link
Collaborator

Woah! That's a pretty huge difference between Windows and Unix!!

Happy you got to the bottom of it! 🎉

@PragTob
Copy link
Member Author

PragTob commented Mar 30, 2018

Interestingly, the native unit on Linux seems to be nanoseconds. I believe I tried this out once - but probably didn't take monotonic time 🤦‍♂️

Might be worth another shot, it means we have to replace :timer.tc but that's fine by me - we'll basically call out to the same functions as it does so we should be as fast as it is.

iex(11)> time = :erlang.monotonic_time()
-576460357193550789
iex(12)> :erlang.convert_time_unit(time, :native, :microsecond)
-576460357193551
iex(13)> :erlang.convert_time_unit(time, :native, :nanosecond)
-576460357193550789
iex(14)> :erts_internal.time_unit()
1000000000

edit: leaves the question if we use the above workaround (converting times until the times are the same) or if we're naughty and call :erts_internal and trust our CI to tell us if erlang people wanna remove it :D

%% As the module name imply, this module is here for ERTS internal
%% functionality. As an application programmer you should *never*
%% call anything in this module directly. Functions exported by
%% this module may change behaviour or be removed at any time
%% without any notice whatsoever. Everything in this module is
%% intentionally left undocumented, and should remain so.

@OvermindDL1
Copy link

Just tested here, Windows and Linux report VERY different time values, this really seems like a bug to me, maybe report it? At the very least not reporting the temporal range of the return is a bug.

PragTob and others added 2 commits April 1, 2018 14:49
* =~ seems to print the text that was matched against while
  Regex.match? doesn't
@PragTob
Copy link
Member Author

PragTob commented Apr 1, 2018

Excluded theses tests or the one problematic assertions and did it on my windows machine --> high hopes that this passes now :)

@OvermindDL1 What exactly do you mean by different values? You mean return values from timer.tc or monotonic time or what? But yeah I'm thinking of reporting a couple of things:

  • timer:tc variant/new function that always reports in native time unit but also returns the native time unit
  • an interface to maybe access the native time unit of the running erlang vm (there's a workaround I can think of and I don't like it...)
  • can we maybe have better time resolution than milliseconds on Windows?

Not sure how the OTP team rolls but wanna report these one of these days :)

@PragTob
Copy link
Member Author

PragTob commented Apr 2, 2018

It passed 💃

@OvermindDL1
Copy link

OvermindDL1 commented Apr 2, 2018

For grabbing time like monotonic time. The extra accuracy on linux is welcome, but we need to know what that accuracy is. ^.^;

Thus I'd think an extra function to get the timing accuracy of the BEAM VM on the current system would be both very nice and backwards compatible.

Copy link
Collaborator

@devonestes devonestes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me!! I'm happy we got to the bottom of why that's happening (and that you included that history in comments 😉)

@PragTob
Copy link
Member Author

PragTob commented Apr 4, 2018

:shipit:

@PragTob PragTob merged commit 250bd0f into master Apr 4, 2018
@PragTob PragTob deleted the appveyor branch April 4, 2018 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants