Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistical profiler in terms of engines #893

Open
dpk opened this issue Dec 14, 2024 · 1 comment
Open

Statistical profiler in terms of engines #893

dpk opened this issue Dec 14, 2024 · 1 comment
Labels

Comments

@dpk
Copy link

dpk commented Dec 14, 2024

This is just an idea: would it be possible to implement a statistical (probabilistic) profiler in terms of Chez Scheme’s engines?

The idea is the profiler would take a thunk and turn it into an engine, then invoking that engine over and over with a tiny amount of fuel. Each time it would extract the continuation of the engine and inspect it to see what the original thunk was doing when it was interrupted, and add that to its profile. Once the engine is complete, return or store the profile counting the number of ticks the thunk is estimated to have spent in each procedure.

At the moment it’s not possible to get the continuation object back out of an engine. This could be solved by making an engine a wrapper procedure. The system library appears to include procedures to inspect the stack of a continuation but I don’t immediately understand how they work; presumably the debugger can help as a model to follow.

The impetus for this idea is that I had some code which was running slowly and I suspected it was a bad hash function causing lots of collisions in hashtable-set! and hashtable-ref. I don’t believe any of Chez’s existing profilers can help with situations like this where the actual slowdown can only be observed within external library code (since it would have to measure either how long these procedures took, not only how many times they were called; or it would have to measure how often they had to reprobe, which would mean annotating the Chez hash table library). In similar situations where a slowdown is only observable by knowing how long internal procedures took, this could be a helpful tool.

@jltaylor-us
Copy link
Contributor

My understanding of how engines are implemented is tenuous, but I think the way that "work" is measured is much closer to execution counts than time. That could still be useful for getting profiling information about code that wasn't built with profiling enabled, I suppose... but if it's code you can edit to fix any issues you find then presumably you'd also be able to build it with profiling enabled (or add cost centers to track real execution time).

For the specific problem of detecting bad hash functions, I have two observations that you may find useful:

If you have a reference to the hashtable, you can use the internal (undocumented) function #%$hashtable-report to get information about bucket load in the hashtable.

> (define ht (make-eq-hashtable))
> (hashtable-set! ht 'x 5)
> (#%$hashtable-report ht)
size, count, max, mean, std = 8, 1, 1, 0.12, 0.33
> (hashtable-set! ht 'y 5)
> (#%$hashtable-report ht)
size, count, max, mean, std = 8, 2, 1, 0.25, 0.43

If you control (or otherwise can profile) the hash and equals functions used to create the hashtable then you can observe that lookups on a key that has many collisions will (on average) call equals many more times than hash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants