Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace hash tables by linked list (or some other?) #35

Closed
rgerhards opened this issue Apr 6, 2016 · 21 comments
Closed

replace hash tables by linked list (or some other?) #35

rgerhards opened this issue Apr 6, 2016 · 21 comments
Assignees
Milestone

Comments

@rgerhards
Copy link
Member

I think that we may get overall better performance if we replace the hash tables by linked lists. Performance profiler (callgrind/kcachegrind) data suggests that the hash functions require a lot of computation time, and that access is not frequent enough to make up for this. We need to try an alternative implementation and compare the performance of the two.

It might make sense to keep both ways inside libfastjson, and permit the caller to specify which one should be used (e.g. linked lists if few fields per object are expected and hash table if many are).

In any case, the API should not assume that internally a hash table is used (there currently even is a function which returns a hash table).

@davidelang
Copy link

On Tue, 5 Apr 2016, Rainer Gerhards wrote:

I think that we may get overall better performance if we replace the hash tables by linked lists. Performance profiler (callgrind/kcachegrind) data suggests that the hash functions require a lot of computation time, and that access is not frequent enough to make up for this. We need to try an alternative implementation and compare the performance of the two.

It might make sense to keep both ways inside libfastjson, and permit the caller to specify which one should be used (e.g. linked lists if few fields per object are expected and hash table if many are).

In any case, the API should not assume that internally a hash table is used (there currently even is a function which returns a hash table).

I agree that for log related data it is extremely unlikely that a hash table
will be faster than a linked list.

On modern CPUs, the cache penalty for the more random access would make a
difference, even if you ignored the cost of calculating the hash. And there just
aren't going to be that many items at any one level.

The most extreme I can think of is dyn_stats data which could countain hundreds
of items, and they are either dumped in an unknown order or used in a foreach
loop. In terms of data in a log, a few tens of items would be a huge set.

David Lang

@rgerhards
Copy link
Member Author

large hash tables also go along with a lot of stress on the malloc subsystem in the current implementation (but of course linked lists also need to grow, but we can use linked tables)..

@davidelang
Copy link

davidelang commented Apr 6, 2016 via email

rgerhards added a commit to rgerhards/libfastjson that referenced this issue Apr 11, 2016
This basically implements the functionality and the testbench
passes. We still need more tests and expect regressions.

see also rsyslog#35
rgerhards added a commit to rgerhards/libfastjson that referenced this issue Apr 11, 2016
This basically implements the functionality and the testbench
passes. We still need more tests and expect regressions.

see also rsyslog#35
@janmejay
Copy link
Member

I think arrays may make more sense than linked-lists. Also, hash-tables with simpler hash-fn may be very effective.

Sorted binary-searchable array may also be worth thinking about. But arrays in my opinion will beat everything else.

The absolute-best performance should come from inline keys (atleast intuitively). I mean something like this:
struct foo { int ptr_key; union { char arr[110]; char *ptr; } key; void * val; };
If var-name is smaller than 109 bytes(without terminator), it'll use var_name array, else it'll use a pointer to it(this fits in 2 cache lines if 64 byte aligned, else 3 cache lines). We can make it __attribute__((packed, aligned(64))) and drop down to 51 bytes (with ptr_key : 1) to fit it in single cache line. Intuition says it'll perform best. But its hard to say in absence of a good quality micro-benchmark.

In general CPU is in abundance in any setup. Lock contention and IO etc are generally problem and most workloads and installations would have significant CPU head-room though.

@rgerhards
Copy link
Member Author

Thanks for the feedback. A lot of work is already completed, see https://github.com/rgerhards/libfastjson/tree/exp-no-linkhash

You are right with the inline keys, but we need not to go overboard. We create very many json objects, so setting aside so much space for inline keys really makes a difference. I'll go with a first try at 7-char keys, which means no space at all is used in those cases (7chars+NUL == sizeof(ptr)).

The main performance concern is malloc/free calls. They clearly dominiate (>50% of execution time) my small benchmarks. So the main goal is to reduce them. But nothing has yet been tested on a real use case.

@rgerhards
Copy link
Member Author

I should also mention that in the rsyslog profiler runs I have seen, malloc/free -induced by json lib- is also the prime concern.

@davidelang
Copy link

davidelang commented Apr 12, 2016 via email

@davidelang
Copy link

On Tue, 12 Apr 2016, Rainer Gerhards wrote:

Thanks for the feedback. A lot of work is already completed, see
https://github.com/rgerhards/libfastjson/tree/exp-no-linkhash

You are right with the inline keys, but we need not to go overboard. We create
very many json objects, so setting aside so much space for inline keys really
makes a difference. I'll go with a first try at 7-char keys, which means no
space at all is used in those cases (7chars+NUL == sizeof(ptr)).

The main performance concern is malloc/free calls. They clearly dominiate
(>50% of execution time) my small benchmarks. So the main goal is to reduce
them. But nothing has yet been tested on a real use case.

the classic answer to this is to have each node in the linked list that's
allcoated contain multiple items.

or going further, allocate substantially larger chunks of ram with the malloc
and then do the allocation within them ourselves.

I suspect that 7 character names will catch a good percentage of things, but
that going slightly larger will make a significant difference (if the next
reasonable step is 15 characters, I expect that will get us very close to
complete coverage)

we should see if we can do a debugging/stats option that will set counters for
variable lengths and object counts at a layer and dump them out along with
pstats. now that we have dyn_stats available, this is much less work to do than
it was before :-)

David Lang

@janmejay
Copy link
Member

Well, large arrays are asymptotically just as cheap O(1).

@rgerhards
Copy link
Member Author

The new code I am working on is a compromise: it is a liked list, but list elements are arrays (pages). The first page is kept within the json object itself, so we do need only a single alloc if all fits in one page. So we do not resize the array, but rather add a new page when it becomes full. The downside is that lookup performance is not good (but it wasn't good with the hash table approach in json-c either).

For the access patterns I have seen, this should bring notable improvement. It will probably become problematic if routinely more than 100 subobjects exists in single objects.

Note in regard to rsyslog: path components are objects, so if you have !a!b!c
you have obj a, which contains subobj b;
obj b, which contains subobj c;
object c

@janmejay
Copy link
Member

I run it with jemalloc, it completely kills malloc cpu footprint. It does exactly that (allocated memory pooling and thread-local allocation buffers).

@janmejay
Copy link
Member

100 is definitely on the higher side. I think most logs will be less than 20 fields.

@rgerhards
Copy link
Member Author

I know that jemalloc helps a lot, but we've seen to many cases where jemalloc failed. We used it as default for a couple of month before we gave up. It's something the (educated) user should tune.

@rgerhards
Copy link
Member Author

@janmejay 20 is also my PoV, often less. Current page size is 8, which again is a time/space tradeoff. BTW: I have just commited a state that should be fully working, but does not yet contain all enhancements (e.g. inlined keys).

@janmejay
Copy link
Member

@rgerhards slight digression, what code-paths used to fail with jemalloc?

@rgerhards
Copy link
Member Author

I don't remember exactly, but many. We had constant trouble. I guess most, if not all, could have solved by using the newest version, but we do not like to maintain jemalloc packages for obvious reasons. It's not that it crashed every hour, but at least once every two weeks we had reports that pointed into jemalloc and often we also found these were bugs fixed in newer versions.

@janmejay
Copy link
Member

FWIW, my cluster uses imptcp with stream-compression, ruleset with several threads, linked-list based queues, omkafka, omprog etc quite heavily and haven't seen any failures due to jemalloc. May be things have improved since. Alll this is on Debian Wheezy.

May be we should try again (do a time-boxed effort)?

@rgerhards
Copy link
Member Author

I think we should move jemalloc off this tracker ;)

rgerhards added a commit to rgerhards/libfastjson that referenced this issue Apr 22, 2016
this now needs to prove in praxis

see also rsyslog#35
rgerhards added a commit to rgerhards/libfastjson that referenced this issue Apr 22, 2016
this now needs to prove in praxis

see also rsyslog#35
@rgerhards rgerhards self-assigned this Apr 28, 2016
@rgerhards
Copy link
Member Author

This is now done in 0.99.3. We will see how this actually affects performance. I close this bug tracker. If issue come up, we will either re-open this one or create a new one. It is suggested that we keep the array implementation in any case, maybe as an additional option if we see cases where the hash tables would make more sense.

@davidelang
Copy link

davidelang commented Apr 28, 2016 via email

@rgerhards
Copy link
Member Author

Thanks for the feedback. I think we should try to get some profiler data, this would be very interesting. Possibly we should coordinate this via private mail and report only the results, so that we can keep this tracker here tidy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants