-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add heuristics for matching packages to ARP after installing #2044
Merged
Merged
Changes from all commits
Commits
Show all changes
41 commits
Select commit
Hold shift + click to select a range
76a0548
Add type for ARP correlation algorithms
da36d8b
Add function to compute best match
d560e33
Add overal structure for tests
c0894f4
Record ARP product code after install
2ac0f75
Use correlation measures in post-install
bcfb37e
Add test cases
d2f4662
Add normalized name measure (very hacky...)
a3e9d49
Add edit distance measure
5574320
Cleanup data
6c4ec11
Add edit distance measure to tests
cb2b9c9
Spelling
fb8da65
Merge branch 'master' into matching
d0f4110
PR comments, cleanup & refactor
435d4c3
Report false matches in tests
58a3f29
Use FoldCase; remove edit distance weights
2cf7961
Cleanup test data
ab55bbc
Fix crashes; add logs
867f156
Put whole ARP entry in context
c01cfbb
Cleanup test data
49727ba
Update test logs
aa5afee
Use type in context
e75287d
Update test data
d58c1ef
Allow empty
abc38b3
Remove unused measure
b647a60
Reduce reporting
d2cc53c
Spelling
0e29cc4
Add empty heuristic override for ARP snapshot tests
7c43ebf
Hide test
891e678
Rename context data
561e21d
Refactor per PR comments; use UTF-32 for edit distance
70ae168
Expand test cases
35683f8
Remove duplicates in data
1087ec5
Copy code for publisher property
0520643
Use Publisher property in tests
95af4ce
Merge branch 'master' into matching
ce8b259
Resolve TODOs
314d2f1
Report time for correlation
b48b697
Do a single allocation for edit distance table
1ccd981
Spelling
b3c3332
Update src/AppInstallerCLITests/Correlation.cpp
lechacon 3f49a78
Use steady_clock
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this telemetry event get moved somewhere else? It should still be done in this function when one is found rather than being done in the helper method that could be used for other purposes.
That might mean changing the output of the helper to return additional information, although the count fields in this event are less meaningful with different algorithms. But we could still calculate the number of changes, how many manifests were above the threshold, and how many of those were changed as the values used here, in that order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had moved it down to the function doing the correlation; but now it's back here. I changed the helper to return the count of changes/matches, although I'm keeping that count to only consider the exact matches from the source search as I couldn't figure out a good way to keep the count consistent across the multiple "passes".
Do you have any ideas how to count the matching manifests when sometimes we use the exact matching and sometimes the confidence measures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as we can reason about the meaning, anything is fine. It can stay using the exact same values as before, just with a better guess. I thought we might use these values in some way to find things that weren't correlating, but it turned out to be very easy to find them 😉
So basically, these numbers are probably not important. Don't spend time trying to improve them, and if you think they are broken, we might consider just reporting 0 for all of them.