Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize and omit duplicate pattern matches #66

Merged
merged 1 commit into from
Oct 19, 2024

Conversation

ViRb3
Copy link
Contributor

@ViRb3 ViRb3 commented Aug 25, 2024

Continuation of #64, this time with an included test to showcase the problem and why the changes are necessary.

  • Optimize pattern sub-matches

With the current implementation of needle search + "truncate right" to handle sub-matches, we end up re-scanning the same regions multiple times. In some cases, this is negligible, in others, it's really bad. There's probably a better way to handle this, but to fix the most basic cases, we now cache each region (start + end address), and skip regex matching if the exact same address was processed before.

The included test, without my changes, returns:

[[0 4] [1 4] [4 7] [7 9] [2 4] [4 7] [7 9] [4 7] [7 9] [5 7] [7 9] [7 9] [4 7] [5 7] [7 9] [7 9] [7 9] [1 4] [2 4] [4 7] [7 9] [4 7] [7 9] [5 7] [7 9] [7 9] [4 7] [5 7] [7 9] [7 9] [7 9] [2 4] [4 7] [7 9] [5 7] [7 9] [7 9] [4 7] [5 7] [7 9] [7 9] [7 9] [4 7] [5 7] [7 9] [7 9] [7 9] [5 7] [7 9] [7 9] [7 9]]

Meanwhile, the expected result, returned after my changes, is:

[[0 4] [1 4] [2 4] [4 7] [5 7] [7 9]]

  • Return end index

Changes the matching function's signature to also return end indexes. This is used for unit tests but would also be useful for users in general, as there is otherwise no way to get the end index with variable length patterns.

  • Deduplicate and sort results

This is workaround for the 1st issue above.

@stevemk14ebr stevemk14ebr changed the base branch from master to optimized_regex October 19, 2024 18:53
@stevemk14ebr stevemk14ebr merged commit 9eb82e5 into mandiant:optimized_regex Oct 19, 2024
2 checks passed
@stevemk14ebr
Copy link
Collaborator

With the changes made in 54a6712 to address memory overhead this cache size is no longer a major concern and so I am merging

stevemk14ebr added a commit that referenced this pull request Oct 19, 2024
@stevemk14ebr
Copy link
Collaborator

live in https://github.com/mandiant/GoReSym/releases/tag/v3.0.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants