Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update mpool and humansize tests #511

Merged
merged 3 commits into from
Jan 28, 2024
Merged

Update mpool and humansize tests #511

merged 3 commits into from
Jan 28, 2024

Conversation

gperciva
Copy link
Member

No description provided.

@gperciva gperciva changed the title Update malloc and humansize tests Update mpool and humansize tests Jan 26, 2024
@gperciva
Copy link
Member Author

The mpool changes are needed for FreeBSD 14.0 with clang; evidently, they improved the malloc speed. (The previous limits were still fine for freebsd 14.0 with gcc.)

The humansize changes are incidental; I happened to look at the logfile, saw:

Parsing "1 BB" as a number of bytes PASSED!

and then wondered what SI prefix B was. That led to commenting the test case definition, then modifying the printf message, and then I figured that I might as well add checks for humansize() itself.

@@ -116,14 +116,14 @@ printf "sets\treps\tratio\n"

# Test with full benefit from malloc pool (the binary begins with an
# initial pool of 100).
cmp_methods 10000 100 200 "full"
cmp_methods 10000 100 180 "full"

# mpool is still considerably faster than malloc in this range.
cmp_methods 1000 1000 200 "partial"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change this one too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm just confused by the word "partial". I assumed that meant that we would see less of a benefit, but it seems like it's more?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, my previous comment was garbage. I need to investigate this some more.

@gperciva gperciva marked this pull request as draft January 27, 2024 03:24
@gperciva
Copy link
Member Author

Hmm. Does the FreeBSD stdlib have any mpool of its own? Look at this funky graph!

freebsd-malloc-time

(I included "clang" in the gnuplot title, but a later test with gcc shows the same thing.)

Huge jump in timing between malloc 200 ints and 201 ints. And similar (although smaller) jumps every subsequent 100 ints.

Same pattern on FreeBSD 13.2 on arm64. I'll test on other versions of FreeBSD, and other OSes, tomorrow.

Here's the code; it's a simplified version of our mpool test. Namely, I removed all mpool stuff, and all libcperciva code.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define BUFMAX 3000

int
main(int argc, char * argv[])
{
	struct timespec mono_begin, mono_end;
	long long mono_delta;
	int i, j;
	int num;
	int * x[BUFMAX];

	/* Parse one argument. */
	if (argc < 2) {
		printf("usage: %s num\n", argv[0]);
		exit(1);
	}
	num = atoi(argv[1]);
	if (num >= BUFMAX) {
		printf("num is too large!\n");
		exit(1);
	}
	
	/* Initial time. */
	if (clock_gettime(CLOCK_MONOTONIC, &mono_begin))
		goto err1;

	for (i = 0; i < 1000; i++) {
		/* Allocate a bunch. */
		for (j = 0; j < num; j++)
			x[j] = malloc(sizeof(int));

		/* Free a bunch. */
		for (j = 0; j < num; j++)
			free(x[j]);
	}

	/* Final time. */
	if (clock_gettime(CLOCK_MONOTONIC, &mono_end))
		goto err1;

	/* Calculate delta and output. */
	mono_delta = (mono_end.tv_nsec - mono_begin.tv_nsec)
	    + 1000000000 * (mono_end.tv_sec - mono_begin.tv_sec);
	printf("%f\n", (double)mono_delta / num / 1000);

	return (0);

err1:
	printf("clock error\n");

	/* Failure */
	return (1);
}

@gperciva
Copy link
Member Author

For comparison, here's the plot from our mpool test; mpool has a much more sensible timing curve.

malloc-mpool

@cperciva
Copy link
Member

Looks like malloc is allocating a slab of pointers at a time; the jumps are when it needs to reserve a kB for the next bunch of pointers. This is probably just a small-allocation optimization, so if you changed from allocating sizeof(int) to allocating e.g. 128 bytes at a time, I'm guessing it will go away.

@gperciva
Copy link
Member Author

Ok, for general interest, here's some comparisons of different malloc sizes. I split them up:

  • 4 to 32 bytes have the same frequency
  • looks like the frequency doubles for 64 to 128, and 128 to 256
  • there's some evidence of frequency in 512 and 1024 (likely doubling as well), but it peters out faster. (I'm sure that an FFT would make the frequencies much easier to spot, but at this point I think we're beyond the realm of "casual interest".)

f1
f2
f3

@gperciva
Copy link
Member Author

More pertinently: what does this mean for our mpool test?

One of the assumptions is that doing 100 allocations (of 4 bytes) would be the ideal case for benefiting from mpool instead of malloc. But on FreeBSD 13 and 14 (and possibly earlier versions), that's still in the range of the slab of pre-allocated pointers.

I could increase the struct stuff to be 256 bytes, but doing allocating that 100 times in a row is still faster (per capita) than doing it 1000 times. Should I bump it to something huge like 16384 bytes? or should we just update the comments to reflect that 100 times might not be the "full benefit", as it depends on the OS.

@gperciva
Copy link
Member Author

Another option would be to change mpool to start off with a size of 1000, then check 1000 for the "full" and 10000 for the "partial".

But my sense is that we tend to use mpool for allocating dozens of items with many bytes, rather than thousands of items with few bytes. So increasing the size of struct stuff is probably better.

This is needed for FreeBSD 14.0 with clang; their malloc is allocating a
slab of pointers at a time, which benefits small allocations.

Our mpool test is not intended as a serious benchmark of memory
allocation speeds; rather, it's a quick check that nothing unusual is
happening with the platform's kernel and libc.  For a serious
comparison, we would need to check the actual scenarios: for example,
are we allocating 4096 records of 16 bytes, or 16 records of 52 bytes?
@gperciva
Copy link
Member Author

Ok, revised patch: change the allocation from int i to uint8_t buf[128], and don't change the ratio limits.

(64 bytes isn't enough to avoid the ratio becoming too low)

@gperciva gperciva marked this pull request as ready for review January 27, 2024 22:38
@cperciva
Copy link
Member

Looks good. For reference, I think the places where I'm using mpool are generally for allocations in the range of 40-100 bytes, although sometimes it's as low as 16 bytes (two pointers).

@cperciva cperciva merged commit f70da0e into master Jan 28, 2024
2 checks passed
@gperciva
Copy link
Member Author

Yeah, in libcperciva we're doing:

  • 4096 records of 16 bytes in events.c
  • 4096 records of 28 bytes in events_immediate.c
  • 16 records of 52 bytes in network_read.c
  • 16 records of 52 bytes in network_read.c

No non-libcperciva uses in tarsnap, scrypt, and spiped. kivaloo does a bunch of 4096 records (and even 32768 records!), but I didn't dig in to see exactly how large those were.

@gperciva gperciva deleted the tests-update branch January 28, 2024 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants