Catch performance regressions in CI #1560

Sean-Der · 2020-11-30T05:18:09Z

Currently we have nothing to catch performance regressions. We have a few benchmarks spread out across repos. No one consistently runs them though.

We should use benchstat, and commit baseline reports in the .github folder. When we see a percentage change (5%?) we should fail the build. IPFS has already implemented it here we just need to port it to GitHub Actions.

This also looks relevant

The text was updated successfully, but these errors were encountered:

Sean-Der · 2020-11-30T05:20:00Z

@at-wat @bshi does this sound like a good plan? Any mistakes I am running head first into :)

@ashellunts @scorpionknifes if you are interested this could be a really fun task to take on!

Sean-Der · 2020-11-30T05:20:54Z

I wrote this benchmark, but didn't commit it

func Benchmark_TrackLocalStaticRTP_SendOnly_WriteRTP(b *testing.B) {
       offer, answer, err := newPair()
       assert.NoError(b, err)

       track, err := NewTrackLocalStaticRTP(RTPCodecCapability{MimeType: "video/vp8"}, "video", "pion")
       assert.NoError(b, err)

       _, err = offer.AddTrack(track)
       assert.NoError(b, err)

       assert.NoError(b, signalPair(offer, answer))

       buffer := make([]byte, 1000)
       for i := 0; i <= b.N; i++ {
               assert.NoError(b, track.WriteRTP(rtp.Packet{Payload: buffer}))
       }

       assert.NoError(b, offer.Close())
       assert.NoError(b, answer.Close())
}

at-wat · 2020-11-30T08:45:25Z

@Sean-Der I think it's better to make it pointer not for the performance but to make it less confusing as I wrote: #1540 (review)

Before we would keep a reference internally. Users would be suprised when values on the passed *rtp.Packet would change.

The contents stored as slices are not copied by passing as a value.
https://github.com/pion/rtp/blob/c8b3d3ec7ff956797d8d29c5ab1cfb054c9e4028/packet.go#L17-L38
I think it's more confusing than passing as pointer since it's partially copied and partially shared. (both for users and developers)
IMO, it would be better to pass it as a pointer and clone whole packet including its slice members where copy is needed.

scorpionknifes · 2020-12-02T08:02:26Z

I was trying out benchstat on github actions. The results are sometimes inconsistent. I'm currently testing using reference. Idk if this is a worker issue since the test runs on two different jobs in github actions.

The exact code I used is https://github.com/scorpionknifes/webrtc/blob/bench/.github/workflows/bench.yaml

old time and new time are tested on the same code.

name                                      old time/op  new time/op   delta
DataChannelSend2-2                         932ns ±16%  3254ns ±141%  +249.33%  (p=0.016 n=5+5)
DataChannelSend4-2                        1.11µs ±23%   1.40µs ±16%      ~     (p=0.063 n=5+4)
DataChannelSend8-2                         885ns ±29%   1402ns ±28%   +58.37%  (p=0.016 n=5+5)
DataChannelSend16-2                        982ns ±29%   2365ns ±93%      ~     (p=0.151 n=5+5)
DataChannelSend32-2                        1.21s ± 0%    1.21s ± 0%      ~     (p=0.222 n=5+5)
_TrackLocalStaticRTP_SendOnly_WriteRTP-2  10.5µs ± 1%    8.9µs ± 1%   -15.04%  (p=0.008 n=5+5)

name                                      old time/op   new time/op  delta
DataChannelSend2-2                        4.02µs ±106%  2.41µs ±99%      ~     (p=0.222 n=5+5)
DataChannelSend4-2                          837ns ±29%  2476ns ±55%  +195.68%  (p=0.008 n=5+5)
DataChannelSend8-2                          881ns ±46%  3477ns ±78%      ~     (p=0.056 n=5+5)
DataChannelSend16-2                        1.53µs ±80%  2.98µs ±77%      ~     (p=0.151 n=5+5)
DataChannelSend32-2                         1.21s ± 0%   1.21s ± 0%      ~     (p=0.286 n=5+4)
_TrackLocalStaticRTP_SendOnly_WriteRTP-2   9.09µs ± 2%  8.37µs ± 1%    -8.02%  (p=0.008 n=5+5)

name                                      old time/op  new time/op  delta
DataChannelSend2-2                        1.00µs ±45%  1.29µs ±37%   ~     (p=0.056 n=5+5)
DataChannelSend4-2                        1.06µs ±33%  1.00µs ±35%   ~     (p=0.841 n=5+5)
DataChannelSend8-2                         824ns ±13%  1203ns ±83%   ~     (p=0.548 n=5+5)
DataChannelSend16-2                        709ns ±14%   684ns ±20%   ~     (p=0.841 n=5+4)
DataChannelSend32-2                        1.21s ± 0%   1.21s ± 0%   ~     (p=0.690 n=5+5)
_TrackLocalStaticRTP_SendOnly_WriteRTP-2  11.3µs ±10%  11.9µs ± 2%   ~     (p=0.095 n=5+5)

All we need to add is to detect % change. I don't think we need to use benchstat. We can compare overall the time taken. (below are overall timing in pairs)

ok  	github.com/pion/webrtc/v3	87.781s
ok  	github.com/pion/webrtc/v3	71.302s

ok  	github.com/pion/webrtc/v3	69.556s
ok  	github.com/pion/webrtc/v3	57.600s

ok  	github.com/pion/webrtc/v3	60.695s
ok  	github.com/pion/webrtc/v3	56.249s

idk about committing reports, as github workers may differ benchmark results

at-wat · 2020-12-02T08:26:57Z

I think the performance on the CI environment really depends on the timing since many CI jobs shares a host machine. (Even if the virtual core is assigned to the job, memory and i/o bandwidth is limited by the host.)
I guess it's better to compare US working hours vs weekend to determine the tolerance.

at-wat · 2020-12-03T00:36:44Z

@Sean-Der any thoughts about #1560 (comment)?

Sean-Der · 2020-12-03T03:47:50Z

@at-wat sorry missed that!

I am 100% in support of leaving it as a pointer, agree with your points. /v3 needs to go out and I don't think changing it this late is a good idea either.

Sean-Der · 2020-12-03T03:51:35Z

@scorpionknifes that is a very good point. Maybe we just test the current commit against master? We can always make it optional, even if we get some false positives I think this is a big step in the right direction :)

Sean-Der · 2020-12-10T00:22:34Z

I spent some time doing research today and found https://pythonspeed.com/articles/consistent-benchmarking-in-ci/

cachegrind looks promising! This might be a 'master only' thing depending on how much longer this makes the tests run.

bshi · 2020-12-10T18:42:35Z

I'm late to the party but have a couple cents. The discussion is focused on how to track progress with an emphasis on microbenchmarking but I'd like to point out that perhaps this is premature given we don't have much clarity on what we should be measuring (and by extension optimizing). The latter is, admittedly, a trickier problem but my intuition is that we could be more effective if we could find a way to get good profiling data [a] in either real world (which would need to be volunteered) or synthetic (a.k.a macrobenchmarks; this would require compute resources / $$$ that the project may not have access to).

So it seems like it's worth (1) polling users to gather data about the top N bottlenecks [b] that exist in practice. What we learn could then (2) feed into efforts to build the appropriate micro/macro-benchmarks that can reproduce said bottlenecks. And finally, after we have experience on what works well and what doesn't work well (to put another way: what mix of micro & macro benchmarking is appropriate) we could built the appropriate tooling to track progress.

[a] In #1516, I volunteered data collected by adding a couple lines of code to my synthetic load test application. So the friction in enabling data collection is IMO low but is it low enough that we actually convince interested parties to contribute?
[b] To make an analogy, what are current key scaling dimensions and limits of typical systems that are being built.

jech · 2020-12-19T16:15:56Z

I think Sean is merely speaking about catching regressions, not about general performance tuning. While microbenchmarks are misleading, they are good for catching regressions, and every performance regression should be investigated, even if it's only so that a human can decide that it doesn't matter in practice.
Having realistic benchmarks would be great, but it's way more ambitious than what Sean is suggesting.

Sean-Der mentioned this issue Nov 30, 2020

Use rtp.Packet instead of *rtp.Packet in APIs #1540

Merged

OrlandoCo mentioned this issue Dec 1, 2020

Performance regression with context introduction #1564

Closed

Sean-Der added difficulty:moderate feature request labels Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Catch performance regressions in CI #1560

Catch performance regressions in CI #1560

Sean-Der commented Nov 30, 2020

Sean-Der commented Nov 30, 2020

Sean-Der commented Nov 30, 2020

at-wat commented Nov 30, 2020

scorpionknifes commented Dec 2, 2020 •

edited

Loading

at-wat commented Dec 2, 2020

at-wat commented Dec 3, 2020

Sean-Der commented Dec 3, 2020

Sean-Der commented Dec 3, 2020 •

edited

Loading

Sean-Der commented Dec 10, 2020

bshi commented Dec 10, 2020

jech commented Dec 19, 2020

Catch performance regressions in CI #1560

Catch performance regressions in CI #1560

Comments

Sean-Der commented Nov 30, 2020

Sean-Der commented Nov 30, 2020

Sean-Der commented Nov 30, 2020

at-wat commented Nov 30, 2020

scorpionknifes commented Dec 2, 2020 • edited Loading

at-wat commented Dec 2, 2020

at-wat commented Dec 3, 2020

Sean-Der commented Dec 3, 2020

Sean-Der commented Dec 3, 2020 • edited Loading

Sean-Der commented Dec 10, 2020

bshi commented Dec 10, 2020

jech commented Dec 19, 2020

scorpionknifes commented Dec 2, 2020 •

edited

Loading

Sean-Der commented Dec 3, 2020 •

edited

Loading