Optimize and omit duplicate pattern matches #66

ViRb3 · 2024-08-25T17:22:00Z

Continuation of #64, this time with an included test to showcase the problem and why the changes are necessary.

Optimize pattern sub-matches

With the current implementation of needle search + "truncate right" to handle sub-matches, we end up re-scanning the same regions multiple times. In some cases, this is negligible, in others, it's really bad. There's probably a better way to handle this, but to fix the most basic cases, we now cache each region (start + end address), and skip regex matching if the exact same address was processed before.

The included test, without my changes, returns:

[[0 4] [1 4] [4 7] [7 9] [2 4] [4 7] [7 9] [4 7] [7 9] [5 7] [7 9] [7 9] [4 7] [5 7] [7 9] [7 9] [7 9] [1 4] [2 4] [4 7] [7 9] [4 7] [7 9] [5 7] [7 9] [7 9] [4 7] [5 7] [7 9] [7 9] [7 9] [2 4] [4 7] [7 9] [5 7] [7 9] [7 9] [4 7] [5 7] [7 9] [7 9] [7 9] [4 7] [5 7] [7 9] [7 9] [7 9] [5 7] [7 9] [7 9] [7 9]]

Meanwhile, the expected result, returned after my changes, is:

[[0 4] [1 4] [2 4] [4 7] [5 7] [7 9]]

Return end index

Changes the matching function's signature to also return end indexes. This is used for unit tests but would also be useful for users in general, as there is otherwise no way to get the end index with variable length patterns.

Deduplicate and sort results

This is workaround for the 1st issue above.

stevemk14ebr · 2024-10-19T18:58:14Z

With the changes made in 54a6712 to address memory overhead this cache size is no longer a major concern and so I am merging

Co-authored-by: ViRb3 <[email protected]>

stevemk14ebr · 2024-10-19T19:01:48Z

live in https://github.com/mandiant/GoReSym/releases/tag/v3.0.1

Optimize and omit duplicate pattern matches

1a943f1

stevemk14ebr changed the base branch from master to optimized_regex October 19, 2024 18:53

stevemk14ebr merged commit 9eb82e5 into mandiant:optimized_regex Oct 19, 2024
2 checks passed

stevemk14ebr added a commit that referenced this pull request Oct 19, 2024

Optimize and omit duplicate pattern matches (#66) (#68)

45d5c9d

Co-authored-by: ViRb3 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize and omit duplicate pattern matches #66

Optimize and omit duplicate pattern matches #66

ViRb3 commented Aug 25, 2024 •

edited

Loading

stevemk14ebr commented Oct 19, 2024

stevemk14ebr commented Oct 19, 2024

Optimize and omit duplicate pattern matches #66

Optimize and omit duplicate pattern matches #66

Conversation

ViRb3 commented Aug 25, 2024 • edited Loading

stevemk14ebr commented Oct 19, 2024

stevemk14ebr commented Oct 19, 2024

ViRb3 commented Aug 25, 2024 •

edited

Loading