Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add heuristics for matching packages to ARP after installing #2044

Merged
merged 41 commits into from
Apr 8, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
76a0548
Add type for ARP correlation algorithms
Mar 22, 2022
da36d8b
Add function to compute best match
Mar 23, 2022
d560e33
Add overal structure for tests
Mar 23, 2022
c0894f4
Record ARP product code after install
Mar 23, 2022
2ac0f75
Use correlation measures in post-install
Mar 24, 2022
bcfb37e
Add test cases
Mar 24, 2022
d2f4662
Add normalized name measure (very hacky...)
Mar 24, 2022
a3e9d49
Add edit distance measure
Mar 24, 2022
5574320
Cleanup data
Mar 24, 2022
6c4ec11
Add edit distance measure to tests
Mar 24, 2022
cb2b9c9
Spelling
Mar 24, 2022
fb8da65
Merge branch 'master' into matching
Mar 28, 2022
d0f4110
PR comments, cleanup & refactor
Mar 30, 2022
435d4c3
Report false matches in tests
Mar 30, 2022
58a3f29
Use FoldCase; remove edit distance weights
Mar 30, 2022
2cf7961
Cleanup test data
Mar 30, 2022
ab55bbc
Fix crashes; add logs
Apr 1, 2022
867f156
Put whole ARP entry in context
Apr 1, 2022
c01cfbb
Cleanup test data
Apr 1, 2022
49727ba
Update test logs
Apr 1, 2022
aa5afee
Use type in context
Apr 1, 2022
e75287d
Update test data
Apr 4, 2022
d58c1ef
Allow empty
Apr 5, 2022
abc38b3
Remove unused measure
Apr 6, 2022
b647a60
Reduce reporting
Apr 6, 2022
d2cc53c
Spelling
Apr 6, 2022
0e29cc4
Add empty heuristic override for ARP snapshot tests
Apr 6, 2022
7c43ebf
Hide test
Apr 6, 2022
891e678
Rename context data
Apr 8, 2022
561e21d
Refactor per PR comments; use UTF-32 for edit distance
Apr 8, 2022
70ae168
Expand test cases
Apr 8, 2022
35683f8
Remove duplicates in data
Apr 8, 2022
1087ec5
Copy code for publisher property
Apr 8, 2022
0520643
Use Publisher property in tests
Apr 8, 2022
95af4ce
Merge branch 'master' into matching
Apr 8, 2022
ce8b259
Resolve TODOs
Apr 8, 2022
314d2f1
Report time for correlation
Apr 8, 2022
b48b697
Do a single allocation for edit distance table
Apr 8, 2022
1ccd981
Spelling
Apr 8, 2022
b3c3332
Update src/AppInstallerCLITests/Correlation.cpp
lechacon Apr 8, 2022
3f49a78
Use steady_clock
Apr 8, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/actions/spelling/excludes.txt
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ ignore$
^Localization/
^NOTICE$
^src/AppInstallerCLICore/Commands/ExperimentalCommand\.cpp$
^src/AppInstallerCLITests/TestData/InputARPData.txt$
^src/AppInstallerCLITests/TestData/InputNames.txt$
^src/AppInstallerCLITests/TestData/InputPublishers.txt$
^src/AppInstallerCLITests/TestData/NormalizationInitialIds.txt$
Expand Down
2 changes: 2 additions & 0 deletions .github/actions/spelling/expect.txt
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,7 @@ hre
hresults
htm
IAttachment
IARP
IConfiguration
idx
IFACEMETHODIMP
Expand Down Expand Up @@ -378,6 +379,7 @@ TStatus
UCase
ucasemap
UChars
ucnv
uec
uild
uintptr
Expand Down
11 changes: 9 additions & 2 deletions src/AppInstallerCLICore/ExecutionContextData.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#pragma once
#include <winget/RepositorySource.h>
#include <winget/Manifest.h>
#include <winget/ARPCorrelation.h>
#include "CompletionData.h"
#include "PackageCollection.h"
#include "Workflows/WorkflowBase.h"
Expand Down Expand Up @@ -47,6 +48,7 @@ namespace AppInstaller::CLI::Execution
// On import: Sources for the imported packages
Sources,
ARPSnapshot,
CorrelatedAppsAndFeaturesEntries,
Dependencies,
DependencySource,
AllowedArchitectures,
Expand Down Expand Up @@ -186,8 +188,13 @@ namespace AppInstaller::CLI::Execution
template <>
struct DataMapping<Data::ARPSnapshot>
{
// Contains the { Id, Version, Channel }
using value_t = std::vector<std::tuple<Utility::LocIndString, Utility::LocIndString, Utility::LocIndString>>;
using value_t = std::vector<Repository::Correlation::ARPEntrySnapshot>;
};

template <>
struct DataMapping<Data::CorrelatedAppsAndFeaturesEntries>
{
using value_t = std::vector<Manifest::AppsAndFeaturesEntry>;
};

template <>
Expand Down
223 changes: 71 additions & 152 deletions src/AppInstallerCLICore/Workflows/InstallFlow.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
#include "WorkflowBase.h"
#include "Workflows/DependenciesFlow.h"
#include <AppInstallerDeployment.h>
#include <winget/ARPCorrelation.h>

using namespace winrt::Windows::ApplicationModel::Store::Preview::InstallControl;
using namespace winrt::Windows::Foundation;
Expand Down Expand Up @@ -506,173 +507,78 @@ namespace AppInstaller::CLI::Workflow

void ReportARPChanges(Execution::Context& context) try
{
if (context.Contains(Execution::Data::ARPSnapshot))
if (!context.Contains(Execution::Data::ARPSnapshot))
{
const auto& entries = context.Get<Execution::Data::ARPSnapshot>();

// Open it again to get the (potentially) changed ARP entries
Source arpSource = context.Reporter.ExecuteWithProgress(
[](IProgressCallback& progress)
{
Repository::Source result = Repository::Source(PredefinedSource::ARP);
result.Open(progress);
return result;
}, true);

std::vector<ResultMatch> changes;

for (auto& entry : arpSource.Search({}).Matches)
{
auto installed = entry.Package->GetInstalledVersion();

if (installed)
{
auto entryKey = std::make_tuple(
entry.Package->GetProperty(PackageProperty::Id),
installed->GetProperty(PackageVersionProperty::Version),
installed->GetProperty(PackageVersionProperty::Channel));

auto itr = std::lower_bound(entries.begin(), entries.end(), entryKey);
if (itr == entries.end() || *itr != entryKey)
{
changes.emplace_back(std::move(entry));
}
}
}

// Also attempt to find the entry based on the manifest data
const auto& manifest = context.Get<Execution::Data::Manifest>();

SearchRequest manifestSearchRequest;
AppInstaller::Manifest::Manifest::string_t defaultPublisher;
if (manifest.DefaultLocalization.Contains(Localization::Publisher))
{
defaultPublisher = manifest.DefaultLocalization.Get<Localization::Publisher>();
}

// The default localization must contain the name or we cannot do this lookup
if (manifest.DefaultLocalization.Contains(Localization::PackageName))
{
AppInstaller::Manifest::Manifest::string_t defaultName = manifest.DefaultLocalization.Get<Localization::PackageName>();
manifestSearchRequest.Inclusions.emplace_back(PackageMatchFilter(PackageMatchField::NormalizedNameAndPublisher, MatchType::Exact, defaultName, defaultPublisher));
return;
}

for (const auto& loc : manifest.Localizations)
{
if (loc.Contains(Localization::PackageName) || loc.Contains(Localization::Publisher))
{
manifestSearchRequest.Inclusions.emplace_back(PackageMatchFilter(PackageMatchField::NormalizedNameAndPublisher, MatchType::Exact,
loc.Contains(Localization::PackageName) ? loc.Get<Localization::PackageName>() : defaultName,
loc.Contains(Localization::Publisher) ? loc.Get<Localization::Publisher>() : defaultPublisher));
}
}
}
const auto& manifest = context.Get<Execution::Data::Manifest>();
const auto& arpSnapshot = context.Get<Execution::Data::ARPSnapshot>();

std::vector<std::string> productCodes;
for (const auto& installer : manifest.Installers)
// Open the ARP source again to get the (potentially) changed ARP entries
Source arpSource = context.Reporter.ExecuteWithProgress(
[](IProgressCallback& progress)
{
if (!installer.ProductCode.empty())
{
if (std::find(productCodes.begin(), productCodes.end(), installer.ProductCode) == productCodes.end())
{
manifestSearchRequest.Inclusions.emplace_back(PackageMatchFilter(PackageMatchField::ProductCode, MatchType::Exact, installer.ProductCode));
productCodes.emplace_back(installer.ProductCode);
}
}
Repository::Source result = Repository::Source(PredefinedSource::ARP);
result.Open(progress);
return result;
}, true);

for (const auto& appsAndFeaturesEntry : installer.AppsAndFeaturesEntries)
{
if (!appsAndFeaturesEntry.DisplayName.empty())
{
manifestSearchRequest.Inclusions.emplace_back(PackageMatchFilter(PackageMatchField::NormalizedNameAndPublisher, MatchType::Exact,
appsAndFeaturesEntry.DisplayName,
appsAndFeaturesEntry.Publisher.empty() ? defaultPublisher : appsAndFeaturesEntry.Publisher));
}
}
}
auto correlationResult = Correlation::FindARPEntryForNewlyInstalledPackage(manifest, arpSnapshot, arpSource);

SearchResult findByManifest;
// Store the ARP entry found to match the package to record it in the tracking catalog later
if (correlationResult.Package)
{
std::vector<AppsAndFeaturesEntry> entries;

// Don't execute this search if it would just find everything
if (!manifestSearchRequest.IsForEverything())
{
findByManifest = arpSource.Search(manifestSearchRequest);
}
auto metadata = correlationResult.Package->GetMetadata();

// Cross reference the changes with the search results
std::vector<std::shared_ptr<IPackage>> packagesInBoth;
AppsAndFeaturesEntry baseEntry;

for (const auto& change : changes)
{
for (const auto& byManifest : findByManifest.Matches)
{
if (change.Package->IsSame(byManifest.Package.get()))
{
packagesInBoth.emplace_back(change.Package);
break;
}
}
}
// Display name and publisher are also available as multi properties, but
// for ARP there will always be only 0 or 1 values.
baseEntry.DisplayName = correlationResult.Package->GetProperty(PackageVersionProperty::Name).get();
baseEntry.Publisher = correlationResult.Package->GetProperty(PackageVersionProperty::Publisher).get();
baseEntry.DisplayVersion = correlationResult.Package->GetProperty(PackageVersionProperty::Version).get();
baseEntry.InstallerType = Manifest::ConvertToInstallerTypeEnum(metadata[PackageVersionMetadata::InstalledType]);

// We now have all of the package changes; time to report them.
// The set of cases we could have for changes to ARP:
// 0 packages :: No changes were detected to ARP, which could mean that the installer
// did not write an entry. It could also be a forced reinstall.
// 1 package :: Golden path; this should be what we installed.
// 2+ packages :: We need to determine which package actually matches the one that we
// were installing.
//
// The set of cases we could have for finding packages based on the manifest:
// 0 packages :: The manifest data does not match the ARP information.
// 1 package :: Golden path; this should be what we installed.
// 2+ packages :: The data in the manifest is either too broad or we have
// a problem with our name normalization.

// Find the package that we are going to log
std::shared_ptr<IPackageVersion> toLog;

// If there is only a single common package (changed and matches), it is almost certainly the correct one.
if (packagesInBoth.size() == 1)
{
toLog = packagesInBoth[0]->GetInstalledVersion();
}
// If it wasn't changed but we still find a match, that is the best thing to report.
else if (findByManifest.Matches.size() == 1)
auto productCodes = correlationResult.Package->GetMultiProperty(PackageVersionMultiProperty::ProductCode);
for (auto&& productCode : productCodes)
{
toLog = findByManifest.Matches[0].Package->GetInstalledVersion();
}
// If only a single ARP entry was changed and we found no matches, report that.
else if (findByManifest.Matches.empty() && changes.size() == 1)
{
toLog = changes[0].Package->GetInstalledVersion();
AppsAndFeaturesEntry entry = baseEntry;
entry.ProductCode = std::move(productCode).get();
entries.push_back(std::move(entry));
}

IPackageVersion::Metadata toLogMetadata;
if (toLog)
{
toLogMetadata = toLog->GetMetadata();
}
context.Add<Data::CorrelatedAppsAndFeaturesEntries>(std::move(entries));
}

// We can only get the source identifier from an active source
std::string sourceIdentifier;
if (context.Contains(Execution::Data::PackageVersion))
{
sourceIdentifier = context.Get<Execution::Data::PackageVersion>()->GetProperty(PackageVersionProperty::SourceIdentifier);
}
// We can only get the source identifier from an active source
std::string sourceIdentifier;
if (context.Contains(Execution::Data::PackageVersion))
{
sourceIdentifier = context.Get<Execution::Data::PackageVersion>()->GetProperty(PackageVersionProperty::SourceIdentifier);
}

Logging::Telemetry().LogSuccessfulInstallARPChange(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this telemetry event get moved somewhere else? It should still be done in this function when one is found rather than being done in the helper method that could be used for other purposes.

That might mean changing the output of the helper to return additional information, although the count fields in this event are less meaningful with different algorithms. But we could still calculate the number of changes, how many manifests were above the threshold, and how many of those were changed as the values used here, in that order.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had moved it down to the function doing the correlation; but now it's back here. I changed the helper to return the count of changes/matches, although I'm keeping that count to only consider the exact matches from the source search as I couldn't figure out a good way to keep the count consistent across the multiple "passes".

Do you have any ideas how to count the matching manifests when sometimes we use the exact matching and sometimes the confidence measures?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we can reason about the meaning, anything is fine. It can stay using the exact same values as before, just with a better guess. I thought we might use these values in some way to find things that weren't correlating, but it turned out to be very easy to find them 😉

So basically, these numbers are probably not important. Don't spend time trying to improve them, and if you think they are broken, we might consider just reporting 0 for all of them.

sourceIdentifier,
manifest.Id,
manifest.Version,
manifest.Channel,
changes.size(),
findByManifest.Matches.size(),
packagesInBoth.size(),
toLog ? static_cast<std::string>(toLog->GetProperty(PackageVersionProperty::Name)) : "",
toLog ? static_cast<std::string>(toLog->GetProperty(PackageVersionProperty::Version)) : "",
toLog ? static_cast<std::string_view>(toLogMetadata[PackageVersionMetadata::Publisher]) : "",
toLog ? static_cast<std::string_view>(toLogMetadata[PackageVersionMetadata::InstalledLocale]) : ""
);
IPackageVersion::Metadata arpEntryMetadata;
if (correlationResult.Package)
{
arpEntryMetadata = correlationResult.Package->GetMetadata();
}

Logging::Telemetry().LogSuccessfulInstallARPChange(
sourceIdentifier,
manifest.Id,
manifest.Version,
manifest.Channel,
correlationResult.ChangesToARP,
correlationResult.MatchesInARP,
correlationResult.CountOfIntersectionOfChangesAndMatches,
correlationResult.Package ? static_cast<std::string>(correlationResult.Package->GetProperty(PackageVersionProperty::Name)) : "",
correlationResult.Package ? static_cast<std::string>(correlationResult.Package->GetProperty(PackageVersionProperty::Version)) : "",
correlationResult.Package ? static_cast<std::string>(correlationResult.Package->GetProperty(PackageVersionProperty::Publisher)) : "",
correlationResult.Package ? static_cast<std::string_view>(arpEntryMetadata[PackageVersionMetadata::InstalledLocale]) : ""
);
}
CATCH_LOG();

Expand All @@ -686,10 +592,23 @@ namespace AppInstaller::CLI::Workflow
return;
}

auto manifest = context.Get<Data::Manifest>();

// If we have determined an ARP entry matches the installed package,
// we set its product code in the manifest we record to ensure we can
// find it in the future.
// Note that this may overwrite existing information.
if (context.Contains(Data::CorrelatedAppsAndFeaturesEntries))
{
// Use a new Installer entry
manifest.Installers.emplace_back();
manifest.Installers.back().AppsAndFeaturesEntries = context.Get<Data::CorrelatedAppsAndFeaturesEntries>();
}

auto trackingCatalog = context.Get<Data::PackageVersion>()->GetSource().GetTrackingCatalog();

trackingCatalog.RecordInstall(
context.Get<Data::Manifest>(),
manifest,
context.Get<Data::Installer>().value(),
WI_IsFlagSet(context.GetFlags(), ContextFlag::InstallerExecutionUseUpdate));
}
Expand Down
7 changes: 4 additions & 3 deletions src/AppInstallerCLICore/Workflows/InstallFlow.h
Original file line number Diff line number Diff line change
Expand Up @@ -167,15 +167,16 @@ namespace AppInstaller::CLI::Workflow
// Outputs: ARPSnapshot
void SnapshotARPEntries(Execution::Context& context);

// Reports on the changes between the stored ARPSnapshot and the current values.
// Reports on the changes between the stored ARPSnapshot and the current values,
// and stores the product code of the ARP entry found for the package.
// Required Args: None
// Inputs: ARPSnapshot?, Manifest, PackageVersion
// Outputs: None
// Outputs: CorrelatedAppsAndFeaturesEntries?
void ReportARPChanges(Execution::Context& context);

// Records the installation to the tracking catalog.
// Required Args: None
// Inputs: PackageVersion?, Manifest, Installer
// Inputs: PackageVersion?, Manifest, Installer, CorrelatedAppsAndFeaturesEntries?
// Outputs: None
void RecordInstall(Execution::Context& context);
}
Loading