Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve centroiding connected component detection #20

Open
markasoftware opened this issue Jul 1, 2022 · 1 comment · May be fixed by #59
Open

Improve centroiding connected component detection #20

markasoftware opened this issue Jul 1, 2022 · 1 comment · May be fixed by #59

Comments

@markasoftware
Copy link
Member

Right now it uses a big recursive function which can stack overflow in certain cases. Ideally it'd be switched to a "worklist" style algorithm (the normal way to convert a recursive function into a loop) and also have some limits put in place to reject unrealistic results (such as a centroid taking up half the screen)

@markasoftware
Copy link
Member Author

This paper is a good example of what we can do. An added bonus is that it only needs to read the file from top to bottom, which can be helpful on embedded:

https://www.cis.rit.edu/class/simg782.old/finding_connected_components.pdf

The algorithm is actually quite typical and simple for the sort of problem it is. Loop through the pixels from top left to bottom right. Whenever we encounter an "on" pixel, we check whether the pixel immediately above or to the left is also "on". If so, we merge the current pixel into the partial centroids to the left and/or top. The only complication is if there are centroids both above and left of our current pixel, but they're different centroids. In this case we need to merge them into the same centroid. There are lots of simple ways to do this; the paper uses integers to identify the centroids, and when centroids X and Y with X>Y (WLOG) need to be merged, assigns H[X]=Y in a hashmap (this is basically a lightweight implementation of the union-find datastructure).

At the end of iteration, you just loop through all the integers that were used as IDs, and add all their pixels to the list of pixels belonging to the "canonical" ID (found by following hashmap links all the way to the bottom) they were merged into (this is basically the same procedure as in a union-find datastructure). Then you can calculate the weighted average or whatever of each cluster, just like we do now.

A further improvement is to not even store the full list of pixels for each cluster. Instead, we can just store the weighted averages, and the total weight for each one. That's enough information to merge two weighted averages together.

@nguy8tri nguy8tri self-assigned this Aug 3, 2022
@nguy8tri nguy8tri linked a pull request Aug 19, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants