Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.7] Cache distinct validation rule data #26509

Merged

Conversation

arjanwestdorp
Copy link
Contributor

@arjanwestdorp arjanwestdorp commented Nov 13, 2018

While validating 1000 records with the distinct rule I encountered some impact on performance for this rule. When performing this validation the same 'distinct data set' is created every time. We're creating 1000 times an array of 999 items with almost identical data.

Because data is not being changed during validation I think we should be able to 'cache' this data for the first record and reuse it for all following records.

It will be a huge improvement on performance while validating bigger datasets:

#items Without caching With caching Improvement
10 5ms 5ms same
100 21ms 7ms 3x
1000 7200s 80ms 90x

test results on my local machine

Optionally if you're not feeling comfortable with this being the default behaviour we could maybe implement this as being configurable with a cached parameter?

@deleugpn
Copy link
Contributor

+1 on improving Laravel for large dataset process. I'll be starting 2019 with a project that will be exposing an API that can take 1000 rows at a time and insert into the database. I didn't work with distinct yet, but I already had to implement a BatchUnique and BatchExists for performance reasons even though the project is still just a prototype.

@sisve
Copy link
Contributor

sisve commented Nov 14, 2018

Because data is not being changed during validation [...]

This sounds like a huge assumption. How would this work with something like this?

foreach(var $inputs as $input) {
    if ($this->validate($input, array('unique:table,...')) {
        DB::table('table')->insert($input);
    }
}

@arjanwestdorp
Copy link
Contributor Author

This does not affect the unique rule, but only the distinct rule so your pseudo code will succeed since distinct doesn't check the database.

I share your concern though and that's why the 'cached data' is being reset every time you want to validate.

$v = new Validator($trans, ['foo' => ['foo', 'foo'], ['foo.*' => 'distinct']);
$v->passes(); // false

$v->setData(['foo' => ['foo', 'bar']]);
$v->passes(); // true, because calling passes again will reset the 'cached data'

@sisve
Copy link
Contributor

sisve commented Nov 14, 2018

And what if the user isn't calling $validator->passes()? What if they are using the trait directly?

If Validator is the one responsible for the lifetime of the cache (and clearing it), then it would make more sense to but the caching logic in the Validator class, not the ValidatesAttributes trait. There's nothing in the code that makes sure that users of the trait properly resets the cache when needed.

@arjanwestdorp
Copy link
Contributor Author

I get your point. What if we move the protected $distinctValues to the Validator and check if that property exists before using the cache? Using the trait directly will then bypass the caching unless you specify a $distinctValues property in your class.

What do you think about this?

@sisve
Copy link
Contributor

sisve commented Nov 15, 2018

I think it would look cleaner if you split out extractDistinctValues from validateDistinct in the trait, and then override that method in the Validator implementation. That way there's no conditionals in the trait about any possible cache, and the cache is entirely handled by code in the Validator class.

trait ValidatesAttributes {
    public function validateDistinct($attribute, $value, $parameters) {
        $data = Arr::except($this->getDistinctValues($attribute), $attribute);
        // ...
    }

    protected function getDistinctValues($attribute) {
        // ...
    }
}

class Validator {
    use ValidatesAttributes {
        getDistinctValues as protected getDistinctValuesCore;
    }

    protected $distinctValues = array();

    protected function getDistinctValues($attribute) {
        $attributeName = $this->getPrimaryAttribute($attribute);

        if (! array_key_exists($attributeName, $this->distinctValues)) {
            $this->distinctValues[$attributeName] = $this->getDistinctValuesCore($attributeName);
        }

        return $this->distinctValues[$attributeName];
    }
}

@arjanwestdorp
Copy link
Contributor Author

I'm not sure about that change, I understand that it looks cleaner. But, when using the trait directly it means you can't use caching unless you implement it yourself again. With the current solution you only would have to add a property which would enable the caching and doesn't need code duplication.

@taylorotwell taylorotwell merged commit af114e1 into laravel:5.7 Nov 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants