-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache prefix should/could be optional #823
Comments
Did you try working with more than one Spreadsheet object concurrently? |
Not in my benchmark script. It wouldn't have worked correctly because Settings is currently using a single cache entry for every spreadsheet. I outlined some of my thoughts for working around that in the description above. |
As I mentioned in #629 removing the PSR cache support entirely cuts the memory usage in half[1] (when profiling with Blackfire, it only saves 30% according to [1] https://blackfire.io/profiles/compare/944b11fd-62fb-40d2-8f8f-e45c46d169c6/graph |
I agree with that, and as performance is a constant struggle with PhpSpreadsheet, we should strive to optimize the common use cases. But we must guarantee that default setup still allow concurrent usage (within same process or different processes). We would also need to document very clearly for advanced users using custom caches that avoiding collision is up to them. The factory idea sounds good to me. Typically that would allow for a custom factory to generate a unique prefix for each instances if needed. Or implement other way to separate caches (eg: use a different directory for a cache writing to disk). It makes me wonder if we should integrate with PSR container. I've been thinking about containers before. At least the IOFactory could benefit from that. Maybe other things too. But if we go that way, we definitely should not raise the bar for newcomers and keep everything straightforward for default use-case. Whichever route we go, I feel that it would almost always introduce (relatively minor) breaking change. And that makes this work a good candidate for v2. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
This is:
What is the expected behavior?
The cell cache does not require a prefix.
What is the current behavior?
All coordinates are prefixed automatically by
PhpOffice\PhpSpreadsheet\Collection\Cells
What are the steps to reproduce?
Please provide a Minimal, Complete, and Verifiable example of code that exhibits the issue without relying on an external Excel file or a web server:
Which versions of PhpSpreadsheet and PHP are affected?
Any version >= 1.0.0-beta
Description
The class
PhpOffice\PhpSpreadsheet\Collection\Cells
automatically adds a cache prefix to every key before adding it to the cache. The prefix looks likephpspreadsheet.5c19219c3c4962.46848305.
.The problem with prefixing is it ends up requiring a lot more memory if the cache adapter stores the keys in memory. The immediately obvious issue is that we have to store the extra 39 characters for the prefix, repeated by the number of cells in the spreadsheet. However the memory usage is even worse because
Cells
already stores an index of every coordinate. If the cache keys were not prefixed PHP could reference the same value for bothCells::index
and the Cache implementation's index. Since the prefix is added PHP stores each coordinate twice, once without the prefix and once with the prefix.I think this could be avoided by letting the cache determine if it wants to add a prefix or not. Then the cache adapter could either namespace the keys itself or the user could use a decorator like this one. It seems like the intention of the prefix is to avoid collisions and the cache should know best if it needs to worry about that.
When using the in memory cache you could avoid collisions by creating a new instance of the memory cache per worksheet. This would require changing Settings::getCache to not force a singleton since it currently returns a single cache instance for all worksheets. We would probably need to let the user specify a factory instead?
I tried working around this by making the cache adapter index the cache keys in a multidimensional array as
[prefix][coordinate]
. It helped memory usage but hurt performance because of all of the string operations necessary.I benchmarked performance without prefixing and saw a 21% improvement in runtime and 22% improvement in memory usage. However there is only a 8.63% improvement in runtime and 14% improvement in memory usage if the changes in #822 are made first.
The text was updated successfully, but these errors were encountered: