[5.7] Cache distinct validation rule data #26509

arjanwestdorp · 2018-11-13T21:18:09Z

While validating 1000 records with the distinct rule I encountered some impact on performance for this rule. When performing this validation the same 'distinct data set' is created every time. We're creating 1000 times an array of 999 items with almost identical data.

Because data is not being changed during validation I think we should be able to 'cache' this data for the first record and reuse it for all following records.

It will be a huge improvement on performance while validating bigger datasets:

#items	Without caching	With caching	Improvement
10	5ms	5ms	same
100	21ms	7ms	3x
1000	7200s	80ms	90x

test results on my local machine

Optionally if you're not feeling comfortable with this being the default behaviour we could maybe implement this as being configurable with a cached parameter?

deleugpn · 2018-11-13T22:51:12Z

+1 on improving Laravel for large dataset process. I'll be starting 2019 with a project that will be exposing an API that can take 1000 rows at a time and insert into the database. I didn't work with distinct yet, but I already had to implement a BatchUnique and BatchExists for performance reasons even though the project is still just a prototype.

sisve · 2018-11-14T06:21:49Z

Because data is not being changed during validation [...]

This sounds like a huge assumption. How would this work with something like this?

foreach(var $inputs as $input) {
    if ($this->validate($input, array('unique:table,...')) {
        DB::table('table')->insert($input);
    }
}

arjanwestdorp · 2018-11-14T06:36:27Z

This does not affect the unique rule, but only the distinct rule so your pseudo code will succeed since distinct doesn't check the database.

I share your concern though and that's why the 'cached data' is being reset every time you want to validate.

$v = new Validator($trans, ['foo' => ['foo', 'foo'], ['foo.*' => 'distinct']);
$v->passes(); // false

$v->setData(['foo' => ['foo', 'bar']]);
$v->passes(); // true, because calling passes again will reset the 'cached data'

sisve · 2018-11-14T10:31:05Z

And what if the user isn't calling $validator->passes()? What if they are using the trait directly?

If Validator is the one responsible for the lifetime of the cache (and clearing it), then it would make more sense to but the caching logic in the Validator class, not the ValidatesAttributes trait. There's nothing in the code that makes sure that users of the trait properly resets the cache when needed.

arjanwestdorp · 2018-11-14T11:50:44Z

I get your point. What if we move the protected $distinctValues to the Validator and check if that property exists before using the cache? Using the trait directly will then bypass the caching unless you specify a $distinctValues property in your class.

What do you think about this?

sisve · 2018-11-15T05:42:32Z

I think it would look cleaner if you split out extractDistinctValues from validateDistinct in the trait, and then override that method in the Validator implementation. That way there's no conditionals in the trait about any possible cache, and the cache is entirely handled by code in the Validator class.

trait ValidatesAttributes {
    public function validateDistinct($attribute, $value, $parameters) {
        $data = Arr::except($this->getDistinctValues($attribute), $attribute);
        // ...
    }

    protected function getDistinctValues($attribute) {
        // ...
    }
}

class Validator {
    use ValidatesAttributes {
        getDistinctValues as protected getDistinctValuesCore;
    }

    protected $distinctValues = array();

    protected function getDistinctValues($attribute) {
        $attributeName = $this->getPrimaryAttribute($attribute);

        if (! array_key_exists($attributeName, $this->distinctValues)) {
            $this->distinctValues[$attributeName] = $this->getDistinctValuesCore($attributeName);
        }

        return $this->distinctValues[$attributeName];
    }
}

arjanwestdorp · 2018-11-15T07:17:24Z

I'm not sure about that change, I understand that it looks cleaner. But, when using the trait directly it means you can't use caching unless you implement it yourself again. With the current solution you only would have to add a property which would enable the caching and doesn't need code duplication.

arjanwestdorp added 4 commits November 13, 2018 21:18

Cache distinct values

41e5b58

Style changes

6456382

Reset distinctValues after each check

6ea332b

Apply style fixes

d9ad686

arjanwestdorp added 3 commits November 14, 2018 21:35

Only cache when distinctValues property exists

c24e904

Style fix

5c7b360

Style fix

af114e1

taylorotwell merged commit af114e1 into laravel:5.7 Nov 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[5.7] Cache distinct validation rule data #26509

[5.7] Cache distinct validation rule data #26509

arjanwestdorp commented Nov 13, 2018 •

edited

Loading

deleugpn commented Nov 13, 2018

sisve commented Nov 14, 2018

arjanwestdorp commented Nov 14, 2018

sisve commented Nov 14, 2018

arjanwestdorp commented Nov 14, 2018

sisve commented Nov 15, 2018

arjanwestdorp commented Nov 15, 2018

[5.7] Cache distinct validation rule data #26509

[5.7] Cache distinct validation rule data #26509

Conversation

arjanwestdorp commented Nov 13, 2018 • edited Loading

deleugpn commented Nov 13, 2018

sisve commented Nov 14, 2018

arjanwestdorp commented Nov 14, 2018

sisve commented Nov 14, 2018

arjanwestdorp commented Nov 14, 2018

sisve commented Nov 15, 2018

arjanwestdorp commented Nov 15, 2018

arjanwestdorp commented Nov 13, 2018 •

edited

Loading