-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a timestamp in the hash cache #337
Use a timestamp in the hash cache #337
Conversation
5931c11
to
782fff6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a weakness in this implementation that I didn't think off at the time.
You don't have a way to invalidate old cache entries using timestamp for files that don't exist any more.
I'm thinking the proper way of implementing this could be to store the timestamp along with the hash in the cache. Then, when you get cache value back, you double check that the timestamp from the cache still matches the one from the file ... if the file exists.
That depends on how you invalidate those caches on a system level. If "fresh" deployments nuke the cache, you're relying on regular deployments before disk space runs out. Or you have a cron job that nukes a cache folder. In both cases, it won't be limited to older caches only, and this makes it infeasible for large or performance sensitive contexts. Specialised cache adapters such as Memcache or Redis have garbage collection built in, it's mostly an issue with the default filesystem caches (constrained by disk space), and the PHP opcache (somewhat constrained by PHP memory). There's an open ticket on how to handle cache cleanup in 4.x: silverstripe/silverstripe-framework#6678. Even if we implemented this in framework (by calling |
e71c25e
to
5a62048
Compare
$filesystem = $this->getFilesystem($fs); | ||
return $filesystem->has($fileID) ? | ||
$filesystem->getTimestamp($fileID) : | ||
DBDatetime::now()->getTimestamp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are instances where we'll be generating a hash value for a file that doesn't exists yet, so we need a fallback in case the file doesn't exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had a chat with @NightJar about the virtue of return a fake timestamp. We were originally doing this, because we were making it part of the key and we sometime needed to generate hash keys for files that might not exists.
After some consideration, there's a scenario where this is useful. If you're moving a file around, the FysystemAssetStore will try to move the existing hash value to a different hash key. If you decide to move the hash before the physical file, you won't have a file there yet. By generating, a timestamp equal to "now", FlysystemAssetStore won't have to worry about this intricacy. So we reduce complexity a little bit.
I tweaked the initial implementation, to store the timestamp on that value, instead of the key. This means we'll only ever have one possible entry for each fileID/Filesystem combination. Before returning the hash, we compare the timestamp in the cache to the one of the physical file. If they don't match, we assume that the hash has been invalidated and recompute it. Those timestamp are accurate to the second. There's a possible edge case where you would have an outside process that update the file within the same second as SilverStripe is trying to read it's hash. I'm not sure how we could address that however. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good. Unsure how the cache is utilised internally though, so will recommend someone else also reviews before merge.
Also tests are failing ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up taking over the work from Adrian ... and I'm delighted with my outstanding work.
37f23ff
to
f475826
Compare
Caching between two servers is not working as expected. This address that.
Checks to see if File exists before proceeding
https://github.com/silverstripe/silverstripe-assets/blob/1/src/Storage/DBFile.php#L365
Uses "Flysystem" to check for file existing
https://github.com/silverstripe/silverstripe-assets/blob/master/src/Flysystem/FlysystemAssetStore.php#L1231
Compares "Hashes" of what is in the database, and what is returned by Flysystem (If they do not match, which they dont, it skips)
https://github.com/silverstripe/silverstripe-assets/blob/master/src/Flysystem/FlysystemAssetStore.php#L327
Flysystem checks if Cached value of Hash exists (which it does) and returns the cached value instead of checking directly.
https://github.com/silverstripe/silverstripe-assets/blob/master/src/Storage/Sha1FileHashingService.php#L116
This is the point in which a different Hash value is returned by Cache on the offending server (as the cache has not been invalidated / cleared on the server which did not receive the upload).
Cache is non-memory based and is set to infinite lifecycle (meaning it won't expire unless it is specifically removed).
https://github.com/silverstripe/silverstripe-assets/blob/master/_config/assetscache.yml#L17
By default, all SilverStripe caches are stored in the TempDir (/tmp/silverstripe...). This directory is not shared between servers, so each individual web server contains its own cache values
Parent issue