Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[9.x] Add a DatabaseEngine #564

Merged
merged 18 commits into from
Jan 8, 2022
Merged

[9.x] Add a DatabaseEngine #564

merged 18 commits into from
Jan 8, 2022

Conversation

driesvints
Copy link
Member

@driesvints driesvints commented Dec 23, 2021

This PR depends on laravel/framework#40129

This PR adds a new database engine to Scout that's powered by the upcoming whereFulltext addition to Laravel's query builder. Using this, we'll be able to offer natural language search in Scout on fulltext indexes for Scout.

I managed to quickly set this up thanks to the existing Collection engine. We'll probably need to abstract those so functionality is shared.

The way this works is that you define the columns of your index through the toSearchableArray method:

/**
 * Get the indexable data array for the model.
 *
 * @return array
 */
public function toSearchableArray()
{
    return [
        'name' => $this->name,
        'bio' => $this->bio,
    ];
}

Then use it as usual:

$users = User::search('Hello world!')->get();

where statements to limit results are also supported:

$invoices = Invoice::search('Forge')->where('status', 'open')->get();

Later on when whereFulltext gets supported by more database engines in Laravel, the upgrade will be seamless.

Todo

  • Figure out a better way to configure columns when models have multiple fulltext indexes.
  • Figure out a way to pass options to the whereFulltext method. Maybe through the scoutMetadata method?
  • Abstract similar functionality from the collection driver.
  • Add tests

@driesvints driesvints changed the title Add a DatabaseEngine [9.x] Add a DatabaseEngine Dec 23, 2021
@taylorotwell
Copy link
Member

We will want a way to also search on normal columns using LIKE or just normal value check like against IDs, etc.

Inspiration for this can be taken from Nova's database search functionality which builds the or where clauses for you, etc.

@tpetry
Copy link

tpetry commented Jan 3, 2022

We will want a way to also search on normal columns using LIKE or just normal value check like against IDs, etc.

How to decide which one to use? Fulltext search and checking against IDs can be index supported, wildcard search with a leading wildcard can not use an index (except on PostgreSQL with pg_trgm). But wildcard search with a trailing wildcard can use an index, but again only if it is indexed and only for an index which will support the operation (hash indexes won't work).

And even for fulltext search scout has to know which columns are fulltext indexed or the query will be stopped with an error on missing index (MySQL) or will be slow (PostgreSQL).

@driesvints
Copy link
Member Author

We will want a way to also search on normal columns using LIKE or just normal value check like against IDs, etc.

@taylorotwell normal value checks are already supported:

$invoices = Invoice::search('Forge')->where('id', 5)->get();

For LIKE they can already use the callback with the search method:

use Laravel\Scout\Builder;

$invoices = Invoice::search('Forge', function ($query, Builder $builder string $search) {
    $query->where('title', 'LIKE', $search);
})->get();

That'll invoke the fulltext search and an additional where like clause.

We could also add a new whereLike method to the Scout builder that only works on the DatabaseEngine:

$invoices = Invoice::search('Forge')->whereLike('title', 'Forge')->get();

But that might be a bit too much and confusion for the other engines?

@taylorotwell I actually want to know more why you want to support LIKE and additional column clauses. It seems to me we need to focus this all towards fulltext indexes? Like @tpetry says, I don't think it's ideal to mix things up with LIKE, etc.

@taylorotwell
Copy link
Member

taylorotwell commented Jan 4, 2022

@driesvints it's extremely common to search by primary ID in business applications (for example, order ID) in addition to fields that may be full text indexed like a shipping address or order notes field using a single search input field.

So, for example, if I type "1" in the search box I may want to find the user with an ID of 1, while I can also type "Arkansas" in the search box to find orders with "Arkansas" in the full-text indexed notes. Without that functionality I frankly would not find a database driver for Scout useful at all.

The type of search functionality I'm proposing is exactly the kind of search functionality offered by the search field in Laravel Nova, which I think I noted that code may provide some good inspiration here as it works really well across multiple database drivers.

I will look into this PR more tomorrow and will likely work on the implementation if I can.

@driesvints
Copy link
Member Author

So, for example, if I type "1" in the search box I may want to find the user with an ID of 1, while I can also type "Arkansas" in the search box to find orders with "Arkansas" in the full-text indexed notes. Without that functionality I frankly would not find a database driver for Scout useful at all.

How will you know which field(s) to search? Sounds more like the ID is part of a separate filter than the search box?

The type of search functionality I'm proposing is exactly the kind of search functionality offered by the search field in Laravel Nova, which I think I noted that code may provide some good inspiration here as it works really well across multiple database drivers.

I'll investigate that.

@tpetry
Copy link

tpetry commented Jan 6, 2022

So, for example, if I type "1" in the search box I may want to find the user with an ID of 1, while I can also type "Arkansas" in the search box to find orders with "Arkansas" in the full-text indexed notes. Without that functionality I frankly would not find a database driver for Scout useful at all.

I like that approach but had no idea until now how to do it.

A possible solution for the database driver could be a specification for every column. I like the approach of pmatseykanets/laravel-scout-postgres. Additionally to the normal scout configuration, another array is defined with options for the driver. The search method would be defined for every column (wildcard/fulltext) and be used by the driver to build the query.

Either way a configuration method on the model needs to be supported to pass the $options configuration to whereFulltext.

@driesvints
Copy link
Member Author

A possible solution for the database driver could be a specification for every column. I like the approach of pmatseykanets/laravel-scout-postgres. Additionally to the normal scout configuration, another array is defined with options for the driver. The search method would be defined for every column (wildcard/fulltext) and be used by the driver to build the query.

Does that mean that we add "or" clauses for every mapped column?

@tpetry
Copy link

tpetry commented Jan 6, 2022

Does that mean that we add "or" clauses for every mapped column?

That was my first idea. But it does not work as defining multiple columns for fulltext searching would try to do a fulltext search on each by it's own and then or the results, which will not work.

Maybe something like this (i don't like the structure, just sharing the idea):

public function searchableOptions()
{
    return [
        'columns' => [
            'like_search' => ['id', 'email'], // these can be "or"ed
            'fulltext_search' => [ // each group needs to use one whereFullText and then can be "or"ed
                ['body', 'title],
                ['subject', 'description],
            ],
        ],
    ];
}

@taylorotwell
Copy link
Member

taylorotwell commented Jan 6, 2022

Hey all,

I've pushed my work for today which is a pretty big rewrite of this. It now behaves more like Laravel Nova search with the added ability for real full-text searching with the new whereFullText methods as well as a combination of LIKE and also LIKE with no leading %.

I built a small search interface for myself to test it with:

CleanShot 2022-01-06 at 15 18 58@2x

Overview

So, given the following toSearchableArray method:

CleanShot 2022-01-06 at 15 18 21@2x

The following naive, LIKE based query would be executed by default, allowing very basic searching across all columns:

CleanShot 2022-01-06 at 15 19 59@2x

This is exactly like Nova. Note - this portion of the query is grouped within parenthesis, again, like Nova, for proper logical grouping if the developer adds additional where clauses to the query.

If you try to search by an integer and the primary key type is an integer for the model, it will attempt an exact search against that value instead of a LIKE - again, just like Nova.

Full Text Support

You may easily specify which columns actually have full text indexes using Attributes:

image

This now generates a query like so:

CleanShot 2022-01-06 at 15 21 30@2x

LIKE Without Wildcard Prefix

Some columns you may want to be searched using where like 'foo%' - with no leading % on the LIKE for better performance. This is also supported:

CleanShot 2022-01-06 at 15 22 18@2x

Other Improvements

I have made quite a few other improvements. Most notably, the original implementation of this driver executed 2 database queries... one to perform the search and then an extra, extraneous query on all the matching models by ID using a where in statement. That has been removed and now the models returned from the initial search query are used as the actual result set so only 1 query is executed to perform the search.

I've also updated the pagination logic to handle the LIMIT and OFFSET at the database level instead of potentially returning thousands of models to Laravel and doing it via the collections.

Caveats

The Scout shouldBeSearchable method is sort of inherently not compatible with a database engine that searches your tables directly. The purpose of that method was to prevent models from being added to search indexes at all. This makes sense for things where the index is separate from the underlying table like Algolia, but for a database engine, the data is always in the table.

Therefore, when using the database engine users would be encouraged / required to add the proper where constraints to the query when searching:

Post::search('foo')->where('published', true)->get();

Otherwise, we would be required to retrieve every matching model into memory (even when paginating) and manually invoke shouldBeSearchable on that model, which isn't practical for thousands of models.

@taylorotwell taylorotwell marked this pull request as ready for review January 6, 2022 21:40
@robsontenorio
Copy link

Does fulltext approach , instead MeiliSeach or ElasticSearch, implies in too much heavy load on database? Locks ? Performance?

Does anybody has any concerns about that ?

I really would love to get rid off that extra piece of infrastructure (Melisearch/Elastic)

@fhferreira
Copy link

fhferreira commented Jan 7, 2022

Does fulltext approach , instead MeiliSeach or ElasticSearch, implies in too much heavy load on database? Locks ? Performance?

Does anybody has any concerns about that ?

I really would love to get rid off that extra piece of infrastructure (Melisearch/Elastic)

+1

@driesvints
Copy link
Member Author

driesvints commented Jan 7, 2022

@taylorotwell this looks amazing, thanks! I do believe the SearchUsingFullText attribute needs an options array for extra options to pass to the whereFullText method for engines underneath to use.

Most likely we'll need to provide a way to map those options separate per column.

/cc @tpetry ☝️

}

$query->orWhere(
$builder->model->qualifyColumn($column),
Copy link
Member

@crynobone crynobone Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$builder->model->qualifyColumn($column),
$query->qualifyColumn($column),

foreach ($columns as $column) {
if (in_array($column, $fullTextColumns)) {
$query->orWhereFullText(
$builder->model->qualifyColumn($column),
Copy link
Member

@crynobone crynobone Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$builder->model->qualifyColumn($column),
$query->qualifyColumn($column),

Available since 5.5 laravel/framework#22577

@taylorotwell
Copy link
Member

taylorotwell commented Jan 7, 2022

Does fulltext approach , instead MeiliSeach or ElasticSearch, implies in too much heavy load on database? Locks ? Performance?

Does anybody has any concerns about that ?

I really would love to get rid off that extra piece of infrastructure (Melisearch/Elastic)

I have no idea on the performance. Depends on the size of your database I'm sure. This driver is intended for usage on smaller data sets and workloads where such performance isn't a concern.

@tpetry
Copy link

tpetry commented Jan 7, 2022

Does fulltext approach , instead MeiliSeach or ElasticSearch, implies in too much heavy load on database? Locks ? Performance?

Does anybody has any concerns about that ?

I really would love to get rid off that extra piece of infrastructure (Melisearch/Elastic)

It's highly dependent on your database, your data model and your queries. In the past MySQL was pretty bad when you did a fulltext search with a condition to filter rows because MySQL was not able to use multiple indexes for a single query. But that has changed with 8.0, whether it will use multiple indexes or not is still depending on a lot of things and some magic of MySQL's query planner. And it's still not choosing multiple indexes today very often, but it's getting better and better.

I only have recent experience with PostgreSQL's fulltext search which is working really nice with a few million entries. And as PostgreSQL will use multiple indexes since 8.1 (2005) the query planner is really good at it and you can blindly throw queries at it and it will execute the fulltext queries very fast. Combined with tight control of word stemming, relevances for columns and ranking for sorting it's a good simpler alternative to Melisearch/Elastic for low/medium workloads. But sure, a single database server can't beat a 6 node Elastic cluster.

As said, there is no general statement to performance, it depends on a lot of things. It's best you try your workload and if it's working you can be happy to remove Melisearch/Elastic from your stack.

--

I just wanted to add that everyone is concerned about "performance". Everyone is concerned about slow databases queries. Databases can do a lot more things efficiently than most people believe. Instead of researching performance problems of solution x just implement it and test it. Nobody is able to predict special behaviour accurately (except if you do stupid things). And everything is always completely dependent on your data model, data size and querying behaviour. Something working perfectly for me may work horrible for you because these 3 properties are different for you - but most often only the first and last one make a difference.

@taylorotwell
Copy link
Member

Instead of researching performance problems of solution x just implement it and test it.

✔️

@taylorotwell taylorotwell merged commit 0ff8f08 into 9.x Jan 8, 2022
@taylorotwell taylorotwell deleted the database-driver branch January 8, 2022 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants