You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
loading one or more signatures from JSON via stdin
loading one or more signatures from a JSON file
loading one or more .sig/.sig.gz files from within a directory via traversal
loading many signatures from many files (including zip, LCA, SBTs, and manifests) specified in a pathlist file
In #1425, I realized that MultiIndex always generates an in-memory manifest containing full signatures, even when wrapping other indices via a pathlist file.
In particular, this means that when loaded via a pathlist
all signatures from all databases are loaded into memory;
SBTs and LCAs are not searched using indexed search;
This is not good behavior, to say the least. Luckily we don't really suggest using pathlists anywhere in the documentation so it's not a pressing matter, but we should fix this sometime.
One way to fix this would be to use the StandaloneManifestIndex class from #1891 to load or generate all manifests of wrapped index objects upon load, and then store only the manifests in memory. This would support lazy loading of all the files, with the only negative being a performance hit from parsing JSON files (and LCA indices) each time we needed the actual signatures. I think this is probably OK as it is in keeping with the trend towards low-memory-is-better and since we have many good on-disk signature storage mechanisms now, it's probably better to point people towards those for storage.
The text was updated successfully, but these errors were encountered:
ctb
changed the title
improve wrapping of Index classes from command-line functionality
support low-memory wrapping of Index classes from command-line functionality
Mar 26, 2022
ctb
changed the title
support low-memory wrapping of Index classes from command-line functionality
reimplement wrapping of Index classes from command-line for lower memory
Apr 16, 2022
One way to fix this would be to use the StandaloneManifestIndex class from #1891 to load or generate all manifests of wrapped index objects upon load, and then store only the manifests in memory. This would support lazy loading of all the files, with the only negative being a performance hit from parsing JSON files (and LCA indices) each time we needed the actual signatures. I think this is probably OK as it is in keeping with the trend towards low-memory-is-better and since we have many good on-disk signature storage mechanisms now, it's probably better to point people towards those for storage.
random thought: can we provide "hints" on Index classes that guide enclosing classes?
"this class is a memory hog" (e.g. LCA_Database)
"this class is on disk and permits fast lazy load" (e.g. ZipFileLinearIndex, SqliteIndex)
"this class doesn't support random indexing" (e.g. .sig.gz files)
The
MultiIndex
class is used for four purposes:In #1425, I realized that
MultiIndex
always generates an in-memory manifest containing full signatures, even when wrapping other indices via a pathlist file.In particular, this means that when loaded via a pathlist
This is not good behavior, to say the least. Luckily we don't really suggest using pathlists anywhere in the documentation so it's not a pressing matter, but we should fix this sometime.
One way to fix this would be to use the
StandaloneManifestIndex
class from #1891 to load or generate all manifests of wrapped index objects upon load, and then store only the manifests in memory. This would support lazy loading of all the files, with the only negative being a performance hit from parsing JSON files (and LCA indices) each time we needed the actual signatures. I think this is probably OK as it is in keeping with the trend towards low-memory-is-better and since we have many good on-disk signature storage mechanisms now, it's probably better to point people towards those for storage.ref #1895 #1096
The text was updated successfully, but these errors were encountered: