-
Notifications
You must be signed in to change notification settings - Fork 841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP]Caching exposed modules for Hackage packages in Pantry DB #4624
Conversation
Fixes #4536 |
@snoyberg could you take a look at this variant? On my machine for
From logs it looks like lock files could give a bit more improvement and probably we could get something like 2x slowdown comparing to the current stable version. Also I have an idea to store module list as a blob (maybe even in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can discuss this if the overall idea isn't clear here. But: instead of performing as much logic in the Haskell side, what about pushing the lookup logic into the SQLite database instead? I'm picturing something like this:
SnapshotModuleCache
someUniqueSnapshotIdentifier Text
SnapshotPackage
snapshot SnapshotModuleCacheId
cabal BlobId
UniqueSnapshotPackage snapshot cabal
PackageExposedModule
cabal BlobId
module ModuleId
UniquePackageExposedModule
Then on the Haskell side:
-- Ensures that the tables are filled in for a snapshot
populateSnapshotModuleCache :: Snapshot -> RIO env ()
-- Find all packages which contain the given module name
findPackagesWithModule :: Snapshot -> ModuleName -> RIO env [PackageName]
I have the details very wrong above, but I think with this you could:
- Have a relatively slow
populate
action, with a sticky log message explaining "populating module name cache." This will happen once, and the first usage of a snapshot is already expected to be a bit slow, what with downloading the snapshot file and such. - Have a much smaller surface area for data transfer between Haskell and SQLite for each module name lookup
@@ -62,6 +62,7 @@ dependencies: | |||
- resourcet | |||
- rio-prettyprint | |||
- mtl | |||
- extra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we've been trying to avoid adding dependencies on these kinds of packages.
-- Cache of modules exposed by a Hackage package | ||
HackageExposedModules | ||
cabal BlobId | ||
moduleName P.ModuleNameP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there are likely to be a lot of duplicated module names, this would probably benefit from normalization, the same way we normalize package names. How about:
moduleName ModuleId
Module
name P.ModuleName
UniqueModule name
Or something like that?
Note: Documentation fixes for https://docs.haskellstack.org/en/stable/ should target the "stable" branch, not master.
Please include the following checklist in your PR:
Tested manually.