Update mpool and humansize tests #511

gperciva · 2024-01-26T23:53:30Z

No description provided.

gperciva · 2024-01-26T23:57:15Z

The mpool changes are needed for FreeBSD 14.0 with clang; evidently, they improved the malloc speed. (The previous limits were still fine for freebsd 14.0 with gcc.)

The humansize changes are incidental; I happened to look at the logfile, saw:

Parsing "1 BB" as a number of bytes PASSED!

and then wondered what SI prefix B was. That led to commenting the test case definition, then modifying the printf message, and then I figured that I might as well add checks for humansize() itself.

cperciva · 2024-01-27T02:08:30Z

tests/mpool/test_mpool.sh

@@ -116,14 +116,14 @@ printf "sets\treps\tratio\n"

 # Test with full benefit from malloc pool (the binary begins with an
 # initial pool of 100).
-cmp_methods 10000 100 200 "full"
+cmp_methods 10000 100 180 "full"

 # mpool is still considerably faster than malloc in this range.
 cmp_methods 1000 1000 200 "partial"


Should we change this one too?

Ok, I'm just confused by the word "partial". I assumed that meant that we would see less of a benefit, but it seems like it's more?

wait, my previous comment was garbage. I need to investigate this some more.

gperciva · 2024-01-27T05:30:53Z

Hmm. Does the FreeBSD stdlib have any mpool of its own? Look at this funky graph!

(I included "clang" in the gnuplot title, but a later test with gcc shows the same thing.)

Huge jump in timing between malloc 200 ints and 201 ints. And similar (although smaller) jumps every subsequent 100 ints.

Same pattern on FreeBSD 13.2 on arm64. I'll test on other versions of FreeBSD, and other OSes, tomorrow.

Here's the code; it's a simplified version of our mpool test. Namely, I removed all mpool stuff, and all libcperciva code.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define BUFMAX 3000

int
main(int argc, char * argv[])
{
	struct timespec mono_begin, mono_end;
	long long mono_delta;
	int i, j;
	int num;
	int * x[BUFMAX];

	/* Parse one argument. */
	if (argc < 2) {
		printf("usage: %s num\n", argv[0]);
		exit(1);
	}
	num = atoi(argv[1]);
	if (num >= BUFMAX) {
		printf("num is too large!\n");
		exit(1);
	}
	
	/* Initial time. */
	if (clock_gettime(CLOCK_MONOTONIC, &mono_begin))
		goto err1;

	for (i = 0; i < 1000; i++) {
		/* Allocate a bunch. */
		for (j = 0; j < num; j++)
			x[j] = malloc(sizeof(int));

		/* Free a bunch. */
		for (j = 0; j < num; j++)
			free(x[j]);
	}

	/* Final time. */
	if (clock_gettime(CLOCK_MONOTONIC, &mono_end))
		goto err1;

	/* Calculate delta and output. */
	mono_delta = (mono_end.tv_nsec - mono_begin.tv_nsec)
	    + 1000000000 * (mono_end.tv_sec - mono_begin.tv_sec);
	printf("%f\n", (double)mono_delta / num / 1000);

	return (0);

err1:
	printf("clock error\n");

	/* Failure */
	return (1);
}

gperciva · 2024-01-27T05:39:36Z

For comparison, here's the plot from our mpool test; mpool has a much more sensible timing curve.

cperciva · 2024-01-27T17:23:53Z

Looks like malloc is allocating a slab of pointers at a time; the jumps are when it needs to reserve a kB for the next bunch of pointers. This is probably just a small-allocation optimization, so if you changed from allocating sizeof(int) to allocating e.g. 128 bytes at a time, I'm guessing it will go away.

gperciva · 2024-01-27T18:36:48Z

Ok, for general interest, here's some comparisons of different malloc sizes. I split them up:

4 to 32 bytes have the same frequency
looks like the frequency doubles for 64 to 128, and 128 to 256
there's some evidence of frequency in 512 and 1024 (likely doubling as well), but it peters out faster. (I'm sure that an FFT would make the frequencies much easier to spot, but at this point I think we're beyond the realm of "casual interest".)

gperciva · 2024-01-27T18:44:56Z

More pertinently: what does this mean for our mpool test?

One of the assumptions is that doing 100 allocations (of 4 bytes) would be the ideal case for benefiting from mpool instead of malloc. But on FreeBSD 13 and 14 (and possibly earlier versions), that's still in the range of the slab of pre-allocated pointers.

I could increase the struct stuff to be 256 bytes, but doing allocating that 100 times in a row is still faster (per capita) than doing it 1000 times. Should I bump it to something huge like 16384 bytes? or should we just update the comments to reflect that 100 times might not be the "full benefit", as it depends on the OS.

gperciva · 2024-01-27T19:00:53Z

Another option would be to change mpool to start off with a size of 1000, then check 1000 for the "full" and 10000 for the "partial".

But my sense is that we tend to use mpool for allocating dozens of items with many bytes, rather than thousands of items with few bytes. So increasing the size of struct stuff is probably better.

This is needed for FreeBSD 14.0 with clang; their malloc is allocating a slab of pointers at a time, which benefits small allocations. Our mpool test is not intended as a serious benchmark of memory allocation speeds; rather, it's a quick check that nothing unusual is happening with the platform's kernel and libc. For a serious comparison, we would need to check the actual scenarios: for example, are we allocating 4096 records of 16 bytes, or 16 records of 52 bytes?

gperciva · 2024-01-27T22:37:46Z

Ok, revised patch: change the allocation from int i to uint8_t buf[128], and don't change the ratio limits.

(64 bytes isn't enough to avoid the ratio becoming too low)

cperciva · 2024-01-28T00:02:13Z

Looks good. For reference, I think the places where I'm using mpool are generally for allocations in the range of 40-100 bytes, although sometimes it's as low as 16 bytes (two pointers).

gperciva · 2024-01-28T00:18:48Z

Yeah, in libcperciva we're doing:

4096 records of 16 bytes in events.c
4096 records of 28 bytes in events_immediate.c
16 records of 52 bytes in network_read.c
16 records of 52 bytes in network_read.c

No non-libcperciva uses in tarsnap, scrypt, and spiped. kivaloo does a bunch of 4096 records (and even 32768 records!), but I didn't dig in to see exactly how large those were.

gperciva changed the title ~~Update malloc and humansize tests~~ Update mpool and humansize tests Jan 26, 2024

cperciva reviewed Jan 27, 2024

View reviewed changes

gperciva marked this pull request as draft January 27, 2024 03:24

gperciva added 3 commits January 27, 2024 14:36

tests/humansize: comments and clarifications

c68b349

tests/humansize: check generated strings

a92f46b

gperciva force-pushed the tests-update branch from e77d02f to 08df7ca Compare January 27, 2024 22:36

gperciva marked this pull request as ready for review January 27, 2024 22:38

cperciva merged commit f70da0e into master Jan 28, 2024
2 checks passed

gperciva deleted the tests-update branch January 28, 2024 00:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update mpool and humansize tests #511

Update mpool and humansize tests #511

gperciva commented Jan 26, 2024

gperciva commented Jan 26, 2024

cperciva Jan 27, 2024

cperciva Jan 27, 2024

gperciva Jan 27, 2024

gperciva commented Jan 27, 2024

gperciva commented Jan 27, 2024

cperciva commented Jan 27, 2024

gperciva commented Jan 27, 2024

gperciva commented Jan 27, 2024

gperciva commented Jan 27, 2024

gperciva commented Jan 27, 2024

cperciva commented Jan 28, 2024

gperciva commented Jan 28, 2024

Update mpool and humansize tests #511

Update mpool and humansize tests #511

Conversation

gperciva commented Jan 26, 2024

gperciva commented Jan 26, 2024

cperciva Jan 27, 2024

Choose a reason for hiding this comment

cperciva Jan 27, 2024

Choose a reason for hiding this comment

gperciva Jan 27, 2024

Choose a reason for hiding this comment

gperciva commented Jan 27, 2024

gperciva commented Jan 27, 2024

cperciva commented Jan 27, 2024

gperciva commented Jan 27, 2024

gperciva commented Jan 27, 2024

gperciva commented Jan 27, 2024

gperciva commented Jan 27, 2024

cperciva commented Jan 28, 2024

gperciva commented Jan 28, 2024