Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caching on shared environment - add base URL to cachekey to prevent cross-pollination #6358

Closed
sunnysideup opened this issue Dec 1, 2016 · 19 comments
Labels

Comments

@sunnysideup
Copy link
Contributor

sunnysideup commented Dec 1, 2016

We have been using caching a lot lately - like this:

public function testCacheBasics() {
$cache = Cache::factory('test');
$cache->save('Good', 'cachekey');
$this->assertEquals('Good', $cache->load('cachekey'));
}

That works great - but there is a catch (there are many, but this is a big one):

Unless you specifically make sure that each site uses its own caching system (be it APC / temp files / Memcache) any matching cache keys are going to cross-pollinate. Of course this only applies if two sites are running on the same server and are using the same key - but it still applies even if they use separate file systems (apc / memcache).

I confirmed this theory by creating the following task:

<?php


class CacheTest extends BuildTask
{
    protected $title = 'Test Silverstripe Cache';

    protected $description = 'Basic test for the silverstripe cache. It will show the date and time the cache was made.';

    public function run($request)
    {
        $cachekey = 'foo';
        $cache = SS_Cache::factory($cachekey);
        if (!($result = $cache->load($cachekey))) {
            $result = date('Y-m-d H:i:s');;
            $cache->save($result, $cachekey);
        }
        echo $result;
    }

}

you can view this task by visiting: www.mysite.co.nz/dev/tasks/CacheTest.

This task returns the exact same second, if:

  • websites are the same server even if there are in completely separate accounts
  • you are running memcache cache or apc cache (not tested for file based caching)

So, what I would recommend is that we simply add the URL to the caching key, or at least a simplified version of it.

$basicOptions = array('cache_id_prefix' => $for);

and add to the caching id ($for):

$prepend = hash("crc32b", Director::absoluteBaseURL());
$for = $prepend.$for;

As a second rate alternative, I feel that the documentation should at least point out this risk.

My apologies in advance if I have not completely understood how this works.

@sunnysideup
Copy link
Contributor Author

sunnysideup commented Dec 1, 2016

Here is an example of data that could be cross-pollinated between sites if you are running APC / Memcache OR you share a temp folder:

https://github.com/silverstripe/silverstripe-cms/blob/a8cb4a516f5aafd77843f4975c1d9086905fa8d0/code/Controllers/CMSMain.php#L529-L533

@kinglozzer
Copy link
Member

By default the TEMP_FOLDER which SS_Cache will use for filesystem-based caches will be unique for each site (unless you’re defining the TEMP_PATH constant yourself), as it’s generated from the BASE_PATH (the document root). For example:

/var/tmp/silverstripe-cache-php5.6.27-var-www-sitea
/var/tmp/silverstripe-cache-php5.6.27-var-www-siteb

If multiple sites are sharing the same temp folder, there are going to be many other issues (template/class/config manifest collisions for example). So I don’t think this is an issue for filesystem backed caches.

For APC/Memcached however I’m not sure. This should really be configurable by devs when setting up via SS_Cache::add_backend(), but due to Zend Cache’s weird separation of frontend/backend I don’t see how we could achieve that in 3.x.

I’d like some other opinions on this, but my first thought is to use something like BASE_PATH instead of URL (as multiple domain names may share the same codebase).

@sunnysideup
Copy link
Contributor Author

sunnysideup commented Dec 2, 2016

Hi Loz

Thank you for your swift reply.

Temp dir for file caching - agreed, works out of the box.

APC/Memcache is a disaster waiting to happen (I personally had this where we ran a staging and a live site on the same server and base url on the live site got replaced with that of the staging site as it was cached). As far as I am concerned this a security risk.

It can be fixed easily by allowing the dev to add a site specific code of any kind (be it URL / base folder - in reply to you base folder idea, also consider subsites, translations - where URL seems better than base folder).

What have we have a simple private static that can be set using a static function (similar to add_backend and set_expiration) to be pre-pended to all caching keys?

basically like this:

class SS_Cache
{
    private static $prepend_code = ''

    public static function set_prepend_code($code) 
    {
        self::$prepend_code = $s;
    }
...
    public static function factory(..) 
    {
        ....    
        //optional addition of unique piece of sitewide code
        //to make caching keys unique for a site
        $for = self::$prepend_code . $for;
        ...
    }

We could add a USER_ERROR (E_USER_NOTICE) that if the caching system is not file, that it is recommended to add this prepend code or something like this.

This could safely be added to 3.* without affecting any existing code, while allowing you to run two or more sites with Memcache / APC on the same server without ending up with sites mixing up their data (as I said, for the unaware dev, this could, in an unlucky circumstance, create a massive security breach - e.g. if there are two sites in a server with some identical code that caches access rules for objects then you could end up with users having access to the wrong objects).

PS I dont understand the memcache frontend/backend thing at all, so I maybe missing something.

@kinglozzer
Copy link
Member

Pinging @silverstripe/core-team for some more opinions.

We might need to consider this in #6252 too - whether we try to separate caches in core, or make it clear to developers that it’s their responsibility when configuring them. Note that it’s not possible to configure APC/Memcached to have distinct pools in a shared installation: you can only keep things separate with unique keys.

@sunnysideup
Copy link
Contributor Author

sunnysideup commented Dec 6, 2016

as an aside: I got my hosting team to set up memcache and I am now running four instances of it on different ports (i.e. I am separating them for different sites), but it does not seem to be working very well at all.

http://stackoverflow.com/questions/40964315/stop-memcache-from-removing-data-in-php-silverstripe-zend-cache

For the APC cache I found the same thing: after a minute or so the cache expires.

@chillu
Copy link
Member

chillu commented Dec 6, 2016

Can you confirm this is your problem statement? "Multiple sites with separate webroots running on the same server, with Memcache/APC backends (no filesystem path prefix in key)"

Reading this I first thought it's about multiple subsites on the same webroot, but they share state in a way that shouldn't lead to cache conflicts as such - since it's mostly based on database identifiers which are unique within a subsite. Your CMSMain_SiteTreeHints cache example would get clashing keys if you made SiteTree::page_type_classes() specific to subsites, though. For example, Member #1 can access MyPageType on Subsite A, but not Subsite B). That would be fixed by including the domain name of course.

As a second rate alternative, I feel that the documentation should at least point out this risk.

I don't think that's good enough as a fix, since you can't control all cache keys (e.g. the CMSMain example above).

As you point out, using the cache name via the $for argument in SS_Cache::factory will only work for caches which respect the cache_id_prefix prefix (filesystem, not Memcache/APC). I don't think we should limit our solution to filesystem caches (through the proposed prepend_code), but rather have a standard way to influence the cache key (not cache name).

// In core during bootstrap
SS_Cache::addCacheKeyRule('Domain', function($key) { return $key . Director::baseURL();});

$cache = SS_Cache::factory('myCacheName');
$cache->load(SS_Cache::getCacheKey('my-custom-key')); // runs addCacheKeyRule()

We can't enforce that use for 3.x, but could recommend it - and change the core uses of cache keys accordingly. I haven't thought about how this relates to PSR-6, but I'd expect a similar solution (stick to richer cache key abilities).

@sunnysideup
Copy link
Contributor Author

sunnysideup commented Dec 6, 2016

Hi Ingo

Thank you for looking at this.

Can you confirm this is your problem statement? "Multiple sites with separate webroots running on the same server, with Memcache/APC backends (no filesystem path prefix in key)

YES

I did not know that cache_id_prefix. When you at the Core.php class you can see $this->_options['cache_id_prefix'] all over the show:

Are you saying that the cache_id_prefix is used for the cache name but not for the cache-key and that APC / Memcache only use the cache-key and ignore the cache name? My research seems to indicate that the two are combined into one for memcache:

image

The key above is generated from the following code:

image

@kinglozzer
Copy link
Member

@chillu cache_id_prefix does appear to be used globally to prefix the cache keys. It’s prepended to the key in Zend_Cache_Core::_id() (save(), test(), load(), remove() and touch() all seem to call it before passing the key to the backend). The Zend docs also suggest this is the case.

So hopefully a unique, global cache_id_prefix would solve this. My two concerns with that are:

  1. We’re further limiting the length of cache keys. I believe Memcached has a limit on key length (though I can’t find what it is), so if we tack on a long prefix we have potential breakages there.
  2. BC breaks as cache keys are different. I expect most people perform a dev/build?flush after upgrading, but I’m not sure whether people would expect that to, effectively, clean their caches...

@chillu
Copy link
Member

chillu commented Dec 7, 2016

So hopefully a unique, global cache_id_prefix would solve this

Awesome, I've misinterpreted the APC/Memcached Zend_Cache implementations then - didn't see a mention of cache_id_prefix so assumed it wasn't used - but I wasn't looking in the right place. But: It's not configurable through our SS_Cache wrapper.

We’re further limiting the length of cache keys

I've read some stuff about 250 chars cache key length - if we prefix the domain, that could push some keys in projects over the limit. I'd say its best practice to hash cache keys (even if it makes them less transparent), and avoid this issue altogether. I think its enough of an edge case to be acceptable for 3.x - less bad than potential information exposure due to cross-pollinated caches.

BC breaks as cache keys are different. I expect most people perform a dev/build?flush after upgrading, but I’m not sure whether people would expect that to, effectively, clean their caches...

So in this case it wouldn't clean caches, just everything would have different cache keys. Most sites would run a dev/build with flush anyway - does that clear the Zend_Cache now? Sorry, I can't recall... On high traffic sites, that might cause issues (if the implementors haven't thought about warming a cache).

In the end, this is only for a fraction of installs using APC or Memcached running with multiple webroots on the same server. And those installs could either use different PHP configurations with different APC cache settings, or different Memcached services on different ports. Plus they can also take care to implement cache keys which take the domain into account. So we're just left with a few cache keys used by core and modules. I'm inclined to solve this problem in 4.x, and just add some documentation for those cases in 3.x, given the tradeoffs at hand.

@sunnysideup
Copy link
Contributor Author

sunnysideup commented Dec 8, 2016

I've read some stuff about 250 chars cache key length - if we prefix the domain, that could push some keys in projects over the limit. I'd say its best practice to hash cache keys (even if it makes them less transparent), and avoid this issue altogether. I think its enough of an edge case to be acceptable for 3.x - less bad than potential information exposure due to cross-pollinated caches.

Why don't we allow the dev to set a prefix (or even force her / him to set a prefix to add it)

So we're just left with a few cache keys used by core and modules. I'm inclined to solve this problem in 4.x, and just add some documentation for those cases in 3.x, given the tradeoffs at hand.

That is fine with me. This would be best placed with the core team when they update the ZendCache stuff?

@sunnysideup
Copy link
Contributor Author

As an aside, I have been doing lots of testing with memcache on my server and I found it to be totally unreliable and useless (see: http://stackoverflow.com/questions/40964315/stop-memcache-from-removing-data-in-php-silverstripe-zend-cache). I may be missing something, but I have spent many hours to make it work and it definitely does not seem to work properly.

@chillu
Copy link
Member

chillu commented Dec 8, 2016

This would be best placed with the core team when they update the ZendCache stuff?

Yeah, I've left a comment on the RFC proposing a PSR-6 implementation - but you're welcome to propose an API addition there.

@sunnysideup
Copy link
Contributor Author

I have been so frustrated with these caches that I am thinking about writing my own caching system using MYSQL memory tables ... easier to understand and no need for third-party installations on servers, etc...

@sunnysideup
Copy link
Contributor Author

sunnysideup commented Dec 8, 2016

I think I rediscovered another bug in Silverstripe. I am not sure why this bug has not been looked into in more detail, because, as I read it, the current code clears every cache for every silverstripe installation around the for every cache type ... for every ten requests... #6087.

I tried fixing this by setting the automatic_cleaning_factor to zero and it did not actually solve my problem, but here is what a person wrote on the memcache forum (sorry, not my language):

Maybe you should start harassing the SilverStripe people for using
memcached like it's 2005?

I will have a fresh look at this in the morning, but I reckon this needs urgent attention.

@sunnysideup
Copy link
Contributor Author

@kinglozzer
Copy link
Member

@sunnysideup automatic_cleaning_factor won’t have any effect on memcached or APC, as they don’t support automatic cleaning in that way. The automatic_cleaning_factor of 10 means that there’s a 1 in 10 chance that save() will clean old, expired items from the cache - not a 1 in 10 chance it will wipe everything.

but here is what a person wrote on the memcache forum (sorry, not my language):

Maybe you should start harassing the SilverStripe people for using
memcached like it's 2005?

The copy of Zend Framework we’re using is from ~2010, so they’re not far off 😉. Hence why we’re looking to update it ASAP.

@sunnysideup
Copy link
Contributor Author

sunnysideup commented Dec 8, 2016

Also see:
#6383

@sminnee
Copy link
Member

sminnee commented Jun 28, 2017

The original issue has been resolved in SS4.

@sminnee sminnee closed this as completed Jun 28, 2017
@dhensby
Copy link
Contributor

dhensby commented Jul 18, 2017

We've encountered this issue in a project again - caches in APC on shared hosts cross pollinate.

There's currently no way in SS 3 to provide an option to the cache backend to add a prefix per site - I do feel this should be added to core much like we generate a unique prefix for our cache dirs, we should do the same for our cache backends...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants