Statistical profiler in terms of engines #893

dpk · 2024-12-14T10:35:15Z

This is just an idea: would it be possible to implement a statistical (probabilistic) profiler in terms of Chez Scheme’s engines?

The idea is the profiler would take a thunk and turn it into an engine, then invoking that engine over and over with a tiny amount of fuel. Each time it would extract the continuation of the engine and inspect it to see what the original thunk was doing when it was interrupted, and add that to its profile. Once the engine is complete, return or store the profile counting the number of ticks the thunk is estimated to have spent in each procedure.

At the moment it’s not possible to get the continuation object back out of an engine. This could be solved by making an engine a wrapper procedure. The system library appears to include procedures to inspect the stack of a continuation but I don’t immediately understand how they work; presumably the debugger can help as a model to follow.

The impetus for this idea is that I had some code which was running slowly and I suspected it was a bad hash function causing lots of collisions in hashtable-set! and hashtable-ref. I don’t believe any of Chez’s existing profilers can help with situations like this where the actual slowdown can only be observed within external library code (since it would have to measure either how long these procedures took, not only how many times they were called; or it would have to measure how often they had to reprobe, which would mean annotating the Chez hash table library). In similar situations where a slowdown is only observable by knowing how long internal procedures took, this could be a helpful tool.

The text was updated successfully, but these errors were encountered:

jltaylor-us · 2024-12-14T16:08:12Z

My understanding of how engines are implemented is tenuous, but I think the way that "work" is measured is much closer to execution counts than time. That could still be useful for getting profiling information about code that wasn't built with profiling enabled, I suppose... but if it's code you can edit to fix any issues you find then presumably you'd also be able to build it with profiling enabled (or add cost centers to track real execution time).

For the specific problem of detecting bad hash functions, I have two observations that you may find useful:

If you have a reference to the hashtable, you can use the internal (undocumented) function #%$hashtable-report to get information about bucket load in the hashtable.

> (define ht (make-eq-hashtable))
> (hashtable-set! ht 'x 5)
> (#%$hashtable-report ht)
size, count, max, mean, std = 8, 1, 1, 0.12, 0.33
> (hashtable-set! ht 'y 5)
> (#%$hashtable-report ht)
size, count, max, mean, std = 8, 2, 1, 0.25, 0.43

If you control (or otherwise can profile) the hash and equals functions used to create the hashtable then you can observe that lookups on a key that has many collisions will (on average) call equals many more times than hash.

jltaylor-us added the question label Dec 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Statistical profiler in terms of engines #893

Statistical profiler in terms of engines #893

dpk commented Dec 14, 2024 •

edited

Loading

jltaylor-us commented Dec 14, 2024

Statistical profiler in terms of engines #893

Statistical profiler in terms of engines #893

Comments

dpk commented Dec 14, 2024 • edited Loading

jltaylor-us commented Dec 14, 2024

dpk commented Dec 14, 2024 •

edited

Loading