Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve search performance #1339

Merged
merged 12 commits into from
Feb 7, 2018
Merged

Improve search performance #1339

merged 12 commits into from
Feb 7, 2018

Conversation

tobyzerner
Copy link
Contributor

@tobyzerner tobyzerner commented Jan 7, 2018

  • Remove fulltext drivers so that fulltext search can take place directly in the main search query (important for performance)
  • Improve the fulltext query to search discussion titles, sort by relevance
  • Simplify the DiscussionSearcher, extract unnecessary eager loading responsibility
  • Only return a single "most relevant post" in the API

Todo:

  • Add migration to add fulltext index to discussions.title (we can leave the table as InnoDB - we should probably convert posts to InnoDB at some point too)
  • Don't search posts that the user doesn't have permission to see! Same problem as filter out private posts before limiting the number of results #1334 - requires larger scale fix
  • Update appearance of search results on frontend

Overhauled search extensibility API will come later (0.2?) but for now some level of search extension can still be achieved by replacing the default fulltext gambit with a custom one.

@tobyzerner tobyzerner added this to the 0.1.0-beta.8 milestone Jan 7, 2018
@tobyzerner tobyzerner self-assigned this Jan 7, 2018
*/
public function mostRelevantPost()
{
return $this->belongsTo('Flarum\Post\Post', 'most_relevant_post_id');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this will only be used when searching, right? Because most_relevant_post_id will just be the name of a dynamically calculated column? If so, that should be made clear in the docblock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point.

@franzliedke
Copy link
Contributor

Do you already want a more detailed review here? If not, just let me know when this is no longer WIP.

@tobyzerner
Copy link
Contributor Author

tobyzerner commented Jan 9, 2018

@franzliedke I will request one when I push some further changes soon, thanks :)

This change relies on the `visibility-scoping` branch to be merged.
Doing a JOIN between an InnoDB table (discussions) and a MyISAM table
(posts) is very very (very) bad for performance. FULLTEXT indexes are
fully supported in InnoDB now, and it is a superior engine in every
other way, so there is no longer any reason to be using MyISAM.
@tobyzerner tobyzerner mentioned this pull request Jan 31, 2018
@luceos
Copy link
Member

luceos commented Jan 31, 2018

I was thinking related to my pr #1344, again we're adding mysql only raw statements in a migration. We should try to investigate whether we can encapsulate these statements in a conditional whether we're running the correct driver. Otherwise we'll never be able to support more db drivers 😞

@tobyzerner
Copy link
Contributor Author

For the moment, the intention is for the gambit class itself is acting as the encapsulation, so it should probably be renamed to MySqlFulltextGambit.

Different search drivers (eg. ElasticSearch, Algolia) can be implemented by replacing this gambit via GambitManager::setFulltextGambit, the same would go for different database drivers I guess.

For 0.2 (or later) we have plans to overhaul the search API: https://github.com/flarum/core/issues/1355

luceos pushed a commit that referenced this pull request Jan 31, 2018
@tobyzerner
Copy link
Contributor Author

This is ready for review!

Here's the new look for search results:

screen shot 2018-01-31 at 9 53 34 pm

@luceos
Copy link
Member

luceos commented Jan 31, 2018

@tobscure what I was referencing was the migration and the actual ALTER TABLE statements. These are going to kill installation on any other driver than mysql/mariadb.

One way would be to check the getDriverName() from the Connection class.

@tobyzerner
Copy link
Contributor Author

@luceos Ah I see. True.

But let's defer until we actually implement support for other db drivers before changing the migrations. Even if we wrap in a conditional now, other db drivers still aren't going to work.

Copy link
Member

@luceos luceos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave it a spin and I'm very much impressed by the looks! Well done, picking part of the post really creates a rich search experience.

@luceos luceos mentioned this pull request Feb 7, 2018
11 tasks
Copy link
Contributor

@franzliedke franzliedke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. 👍

@franzliedke
Copy link
Contributor

Is this still WIP?

@tobyzerner tobyzerner changed the title [WIP] Improve search performance Improve search performance Feb 7, 2018
@tobyzerner
Copy link
Contributor Author

Nope, it's done :D

@tobyzerner tobyzerner merged commit 322a84f into master Feb 7, 2018
@tobyzerner tobyzerner deleted the search-improvements branch February 7, 2018 20:08
@franzliedke
Copy link
Contributor

@tobscure I am wondering... should we move the mostRelevantPost stuff into its own pseudo-model, or at least API resource. That way, we don't stuff the discussion model with things that are only relevant for search results. Could be something like searchResult or discussionMatchingSearch or something along those lines... duplicating the few discussion attributes that are needed to display the results?

@tobyzerner
Copy link
Contributor Author

Good thought. Could be discussionSearchResult, and could have a relationship with a discussion and a mostRelevantPost?

@franzliedke
Copy link
Contributor

Yeah, sounds good. Will you try it out?

@tobyzerner
Copy link
Contributor Author

Actually, after more thinking, I don't think it makes sense to do this from a JSON-API perspective. GET /api/discussions?filter[q]=whatever must return a list of discussions, not discussionSearchResults.

@franzliedke
Copy link
Contributor

True. We could do a separate endpoint just for searching, but at that point, it probably gets too crazy in another way. 😉

Thanks for considering!

@skywalker512
Copy link

@tobscure Mysql 5.5 can't use fulltext in InnoDB. It only support above 5.6. Will Flarum require mysql 5.6 at least.

@tobyzerner
Copy link
Contributor Author

Have made a note in the relevant issue flarum/docs#14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants