-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure benchmark machine for maximal stability #338
Comments
I seem to remember someone (@adriaanm?) suggesting our script could trigger a reboot and then run the actual benchmark during the shutdown or startup sequence, at a point when superfluous services aren't running and when other users can't log in. We could still use the Jenkins SSH Slave functionality to set all this up, but we'd have to add a custom build step to poll for completion. |
I could imagine that during startup / shutdown or right after startup the system might schedule maintenance tasks and not be the most stable either. We should definitely check if there's a difference if we don't use a jenkins slave / ssh connection. |
Several more suggestions based on my experience:
|
Since I've switched to ssd, they can have periodic maintenance that may slow down stuff. |
one more idea, that I came up with but didn't have time to try out: |
I've added a script (
The last part appears to be ignored, though, running:
Shows the frequencies scaling back and forth between 1200 and 2400. I'm still seeing larger-than-expected variance in the runs. Given: Another step might be to disable the |
This appears to be a pretty comprehensive guide to setting up stable benchmark environments: https://perf.readthedocs.io/en/latest/system.html#system |
Also interesting, Virtual Machine Warmup Blows Hot and Cold
|
|
I did a few experiments with Without
One possible explanation could be that GC causes jitter when there's only one processor available, as it cannot run in parallel. With
With
The large variances when using |
I added
|
It makes sense now: when using taskset to move a process on an isolated cpu, the kernel doesn't do any load balancing across CPUs. https://groups.google.com/forum/#!topic/mechanical-sympathy/Tkcd2I6kG-s, https://www.novell.com/support/kb/doc.php?id=7009596. started reading about cpuset, will experiment. |
Added a script that checks the machine state and sets some of the configurations discussed in the main description of this issue (https://github.com/scala/compiler-benchmark/blob/master/scripts/benv) I ran some experiments in various configurations
I didn't do multiple runs to see the how much the error values vary. The error numbers are probably too close together / jittery to make a meaningful comparison, but I'm trying anyway.
(*)
In combination
Again, the error numbers are not stable enough to make a useful conclusion. |
For comparison I ran a simple benchmark that creates a new Global (https://github.com/scala/compiler-benchmark/compare/master...lrytz:newGlobal?expand=1).
One thing that jumps out is that variances are much more stable between iterations than what we're seeing when running the entire compiler. In the compiler we always see things like
For
Maybe the IO has an impact here. I'll experiment a bit with |
Actually, of course the number of benchmarks invocations is much higher for |
Using a ramdisk (for the I also ran with
This suggests that IO could be a cause of variance, but the ramdisk doesn't help to reduce it. |
Disable hyper-threading
noht
kernel parameter, but others say it doesn't work.echo 0 > /sys/devices/system/cpu/cpuN/online
for allN
that don't have their own core id incat /proc/cpuinfo
NUMA
The machine only has a single NUMA node, so we don't need to worry about it.
http://stackoverflow.com/questions/11126093/how-do-i-know-if-my-server-has-numa
Use cpu sets
Install cset:
sudo apt-get install cpuset
. (On NUMA machines, cset also handles sets of memory nodes, but we only have one.)cset set
to create, manipulate CPU setscset proc
to mange processes into setscset shield
is convenience, simpler to use, allows isolating a processShielding
cset shield
shows the current statuscset shield -c 1-3
cset shield -k on
moves kernel threads (those that can be moved) from root to system (some kernel threads are specific to a CPU and not moved)cset shield -v -s
/-u
show shielded / unshielded processescset shield -e cmd -- -cmdArg
executecmd -cmdArg
in the shieldcset shield -r
reset the shieldReferences
Use isolated CPUs
NOTE: Using isolated CPUs for running the JVM is not a good idea. The kernel doesn't do any load balancing across isolated CPUs. https://groups.google.com/forum/#!topic/mechanical-sympathy/Tkcd2I6kG-s, https://www.novell.com/support/kb/doc.php?id=7009596. Use
cset
instead ofisolcpus
andtaskset
.lscpu --all --extended
lists CPUs, also logical cores (if hyper-threading is enabled). TheCORE
column shows the physical core.Kernel parameter
isolcpus=2,3
removes CPUs 2 and 3 from the kernel's scheduler./etc/default/grub
, for exampleGRUB_CMDLINE_LINUX_DEFAULT="quiet isolcpus=2,3"
sudo update-grub
Verify
cat /proc/cmdline
cat /sys/devices/system/cpu/isolated
taskset -cp 1
-- affinity list of process 1ps -eww --forest -o pid,ppid,psr,user,stime,args
-- there should be nothing on isolated cores.Use
taskset -c 2,3 <cmd>
to runcmd
(and child processes) only on CPUs 2 and 3.Questions
taskset -c 2,3
, does the JVM still think the system has 4 cores? Would that be a problem?References
Tickless / NOHZ
Disable scheduling clock interrupts on the CPUs used for benchmarking, add the
nohz_full=2,3
kernel parameter if there's a single task (thread) on the CPU.Verify
cat /sys/devices/system/cpu/nohz_full
dmesg|grep dyntick
should show the CPUssudo perf stat -C 1 -e irq_vectors:local_timer_entry taskset -c 1 stress -t 1 -c 1
should show 1 tick (see redhat reference)CONFIG_NO_HZ_FULL
), i got numbers between 20 and 90 ticks on the otherwise idle CPU 1. Running on CPU 0, I get ~390 ticks.watch -n 1 -d grep LOC /proc/interrupts
shows 1 tick per second on CPU 1 when idlestress -t 1 -c 1
on CPU 1 causes more ticksNOTE: disabling interrupts has some effect on CPU frequency, see https://fosdem.org/2017/schedule/event/python_stable_benchmark/ (24:45). Make sure to use a fixed CPU frequency. I don't have the full picture yet, but its something like that: the
intel_pstate
driver is no longer notified and does not update the CPU frequency.intel_pstate
when using tickless mode(Some more advanced stuff in http://www.breakage.org/2013/11, pin some regular tasks to specific CPUs,
writeback/cpumask
,writeback/numa
).References
rcu_nocbs
RCU is a thread synchronization mechanism. RCU callbacks may prevent a cpu from entering adaptive-tick mode (tickless with 0/1 tasks). https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt
The
rcu_nocbs=2,3
kernel param prevents CPUs 2 and 3 from queuing RCU callbacks.References
Interrupt handlers
Avoid running interrupt handlers on certain CPUs
/proc/irq/default_smp_affinity
is the default bit mask of CPUs permitted for an interrupt handle/proc/irq/N/
containssmp_affinity
(bit mask of allowed CPUs) andsmp_affinity_list
(list of CPUs able to execute the interrupt handler)Verify
cat /proc/interrupts
There's an
irqbalance
service (systemctl status irqbalance
)irqbalance
when pinning irq handlers to certain processorsReferences
CPU Frequency
Disable Turbo Boost
1
to/sys/devices/system/cpu/intel_pstate/no_turbo
-- if usingpstate
intel_pstate=disable
, find out how to disable turbo boost it in the systemThere seem to be two linux tools
cpufrequtils
, withcpufreq-info
andcpufreq-set
(https://wiki.debian.org/HowTo/CpuFrequencyScaling), used by kruncpupower
(https://wiki.archlinux.org/index.php/CPU_frequency_scaling) - for debian jessie that only exists in backportscpupower
is actively developed and has more features, support for newer cpus (https://bbs.archlinux.org/viewtopic.php?id=135820)Intel can run in different P-States, voltage-frequency pairs when running a process. C-States are idle / power saving states. The
intel_pstate
driver handles this.The
intel_pstate=disable
kernel argument disables theintel_pstate
driver and usesacpi-cpufreq
instead (see redhad reference).sudo apt-get install linux-cpupower
(in jessie backports only!)cpupower frequency-info
andcpupower idle-info
to show the active drivers.CPU Info
lscpu
cat /proc/cpuinfo
(| grep MHz
)cpupower frequency-info
watch -n 1 grep \"cpu MHz\" /proc/cpuinfo
CPUfreq Governors
cpupower frequency-info --governors
(Examples:performance
,powersave
, ...). Should useperformance
, which keeps the maximal frequency. NOTE: theintel_pstate
driver still does dynamic scaling in this mode.cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
cpupower -c 1-3 frequency-set --governor [governor]
(on CPUs 2, 3)Set a specific frequency:
sudo cpupower -c 1-3 frequency-set -f 2400MHz
. Use-u
for max,-d
for min.userspace
cpu governorcpupower frequency-info
intel_pstate
driver (http://stackoverflow.com/questions/23526671/how-to-solve-the-cpufreqset-errors)The
intel_pstate
driver has/sys/devices/system/cpu/intel_pstate/min_perf_pct
andmax_perf_pct
, maybe these can be used if we stick with that driver?References
Disable git gc
https://stackoverflow.com/questions/28092485/how-to-prevent-garbage-collection-in-git
$ git config --global gc.auto 0
Disable hpet
Suggested by Dmitry, I haven't found any other references.
hpet is a hardware timer with a frequency of at least 10 MHz (higher than older timer circuits).
cat /sys/devices/system/clocksource/clocksource0/current_clocksource
cat /sys/devices/system/clocksource/clocksource0/available_clocksource
Change using a kernel parameter
clocksource=acpi_pm
Explanation of clock sources: https://access.redhat.com/solutions/18627
References
Ramdisk
tmpfs vs ramfs
Added to
/etc/fstab
tmpfs /mnt/ramdisk tmpfs defaults,size=16g 0 0
Disable "transparent hugepages"
There are some recommendations out there to disable "transparent hugepages", mostly for database servers
khugepaged
processDisable
khungtaskd
Probably not useful, runs every 120 seconds only. Detects hung tasks.
Cron jobs
https://help.ubuntu.com/community/CronHowto
crontab -e
to edit,crontab -l
to showfor user in $(cut -f1 -d: /etc/passwd); do sudo crontab -u $user -l; done
. Or make sure that the/var/spool/cron/crontabs
directory is empty./etc/crontab
- should not edit by hand/etc/cron.d
contains files with system crontab entries/etc/cron.hourly
/.daily
/.monthly
/.weekly
contain scripts executed from/etc/crontab
(or byanacron
, if installed)Disable / enable cron
systemctl stop cron
systemctl start cron
Disable / enable at
systemctl stop atd
systemctl start atd
Run under perf stat
Suggestion by Dmitry, discard benchmarks with too many cpu-migrations, context-switches. Would need to keep track of expected values.
sudo perf stat -x, scalac Test.scala
(machine-readable output)-prof perfnorm
in jmhReferences
Build custom kernel
Ah well, probably have to figure out some more details how to do this correctly.
Scripting all of that
It seems that python3's "perf" package will do most configurations:
Important: check all settings before starting a benchmark.
Check load
Find a way to ensure that the benchmark machine is idle before starting a job.
Machine Specs
NX236-S2HD (http://www.nixsys.com/nx236-s2hd.html)
The text was updated successfully, but these errors were encountered: