Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]Caching exposed modules for Hackage packages in Pantry DB #4624

Closed
wants to merge 1 commit into from

Conversation

qrilka
Copy link
Contributor

@qrilka qrilka commented Mar 14, 2019

Note: Documentation fixes for https://docs.haskellstack.org/en/stable/ should target the "stable" branch, not master.

Please include the following checklist in your PR:

  • Any changes that could be relevant to users have been recorded in the ChangeLog.md
  • The documentation has been updated, if necessary.

Tested manually.

@qrilka qrilka requested a review from snoyberg March 14, 2019 11:36
@qrilka
Copy link
Contributor Author

qrilka commented Mar 14, 2019

Fixes #4536

@qrilka
Copy link
Contributor Author

qrilka commented Mar 14, 2019

@snoyberg could you take a look at this variant? On my machine for stack script --resolver=lts-13.0 hmm.hs from #4536 I get the following:

version time
master 4.4s
this PR, migrated, not cached 5.4s
this PR, cached modules 1.2s
1.9.3 0.4s

From logs it looks like lock files could give a bit more improvement and probably we could get something like 2x slowdown comparing to the current stable version. Also I have an idea to store module list as a blob (maybe even in hackage_cabal table) - this could save us some time by not doing some lookups.
Do you think it make sense to explore that option?

Copy link
Contributor

@snoyberg snoyberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can discuss this if the overall idea isn't clear here. But: instead of performing as much logic in the Haskell side, what about pushing the lookup logic into the SQLite database instead? I'm picturing something like this:

SnapshotModuleCache
    someUniqueSnapshotIdentifier Text

SnapshotPackage
    snapshot SnapshotModuleCacheId
    cabal BlobId
    UniqueSnapshotPackage snapshot cabal

PackageExposedModule
    cabal BlobId
    module ModuleId
    UniquePackageExposedModule

Then on the Haskell side:

-- Ensures that the tables are filled in for a snapshot
populateSnapshotModuleCache :: Snapshot -> RIO env ()

-- Find all packages which contain the given module name
findPackagesWithModule :: Snapshot -> ModuleName -> RIO env [PackageName]

I have the details very wrong above, but I think with this you could:

  • Have a relatively slow populate action, with a sticky log message explaining "populating module name cache." This will happen once, and the first usage of a snapshot is already expected to be a bit slow, what with downloading the snapshot file and such.
  • Have a much smaller surface area for data transfer between Haskell and SQLite for each module name lookup

@@ -62,6 +62,7 @@ dependencies:
- resourcet
- rio-prettyprint
- mtl
- extra
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've been trying to avoid adding dependencies on these kinds of packages.

-- Cache of modules exposed by a Hackage package
HackageExposedModules
cabal BlobId
moduleName P.ModuleNameP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are likely to be a lot of duplicated module names, this would probably benefit from normalization, the same way we normalize package names. How about:

    moduleName ModuleId

Module
    name P.ModuleName
    UniqueModule name

Or something like that?

@mpilgrem mpilgrem deleted the cache-package-modules branch October 16, 2022 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants