Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tune vpx threading #1851

Closed
totaam opened this issue May 22, 2018 · 12 comments
Closed

tune vpx threading #1851

totaam opened this issue May 22, 2018 · 12 comments

Comments

@totaam
Copy link
Collaborator

totaam commented May 22, 2018

Issue migrated from trac ticket # 1851

component: encodings | priority: major | resolution: fixed

2018-05-22 07:39:28: antoine created the issue


Same as #1840 but for libvpx.

Some links:

It looks like part of the reason why vp8 and vp9 are now faster, and why I chose to use vpx more (see #832#comment:22) is that the threading improvements make it faster.
This does mean that reducing the threading might reduce the performance too much.

@totaam
Copy link
Collaborator Author

totaam commented May 22, 2018

2018-05-22 07:43:45: antoine commented


You can choose the maximum number of threads with:

XPRA_VPX_THREADS=2 xpra start ...

We want to see how this affects frame latency, bandwidth, CPU load, etc.
Unlike x264, it looks like we don't have a lot of room for manoeuver here.
(the current value is "number-of-cpus" minus 1)
Maybe this should be capped at 2 threads.

@totaam
Copy link
Collaborator Author

totaam commented May 30, 2018

2018-05-30 17:31:24: maxmylyn commented


I've set up a quick script that should run a series of three tests runs with XPRA_VPX_THREADS set to 1, 2, and 4. For reference the test box is an 8-core system. I'm more curious to see how much of an impact it has on more low-end machines so I'm going to update one of my low-end test boxes and run the tests again on there.

@totaam
Copy link
Collaborator Author

totaam commented Sep 11, 2018

2018-09-11 15:01:57: antoine edited the issue description

@totaam
Copy link
Collaborator Author

totaam commented Feb 9, 2019

2019-02-09 03:42:04: antoine changed status from assigned to new

@totaam
Copy link
Collaborator Author

totaam commented Feb 9, 2019

2019-02-09 03:42:04: antoine changed owner from maxmylyn to encodedEntropy

@totaam
Copy link
Collaborator Author

totaam commented Aug 1, 2019

2019-08-01 13:00:29: smo changed owner from encodedEntropy to smo

@totaam
Copy link
Collaborator Author

totaam commented Aug 7, 2019

2019-08-07 20:09:26: smo uploaded file test_vpx.tar.gz (138.2 KiB)

VPX data and charts for threads 1/2/4

@totaam
Copy link
Collaborator Author

totaam commented Aug 7, 2019

2019-08-07 20:10:42: smo changed owner from smo to Antoine Martin

@totaam
Copy link
Collaborator Author

totaam commented Aug 7, 2019

2019-08-07 20:10:42: smo commented


I've attached some test data and charts.

The data seems to show that more threads is better.

Can you check these over and let me know if any other action is required.

@totaam
Copy link
Collaborator Author

totaam commented Aug 8, 2019

2019-08-08 10:53:16: antoine changed status from new to closed

@totaam
Copy link
Collaborator Author

totaam commented Aug 8, 2019

2019-08-08 10:53:16: antoine set resolution to fixed

@totaam
Copy link
Collaborator Author

totaam commented Aug 8, 2019

2019-08-08 10:53:16: antoine commented


Interesting data:

  • we encode more pixels per second with more threads but when it comes to actually sending ("pixels sent"), the benefits are much lower as other costs come into play (and maybe we're hitting a performance ceiling?)
  • there seems to be a sweet spot with 2 threads, at least for the batch delay and damage latency
  • going up to 4 threads doesn't gain much (ie: marginal improvement in damage latency and pixels sent per second) - I suspect that this may vary with bigger picture sizes
  • decoding takes a little bit longer with more threads - which is fine, we're almost never bound by the client's decoding speed
  • 4 threads uses quite a bit more server side memory

So, r23474 makes us use fewer threads by default (was number-of-cpus - 1):

>>> import math
>>> for i in range(8):
...  print("%-3i: %2i" % (2**i, math.sqrt(2**i+1)))
... 
1  :  1
2  :  1
4  :  2
8  :  3
16 :  4
32 :  5
64 :  8
128: 11

This can still be overriden using the env var XPRA_VPX_THREADS=

@totaam totaam closed this as completed Aug 8, 2019
@totaam totaam added the v2.3.x label Jan 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant