Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: Full Text Search #7821

Open
1 of 2 tasks
Linicks opened this issue Jul 13, 2016 · 17 comments
Open
1 of 2 tasks

sql: Full Text Search #7821

Linicks opened this issue Jul 13, 2016 · 17 comments
Labels
A-sql-pgcompat Semantic compatibility with PostgreSQL C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community X-anchored-telemetry The issue number is anchored by telemetry references.

Comments

@Linicks
Copy link
Contributor

Linicks commented Jul 13, 2016

All,
I'm sure most of you know about Bleve (https://github.com/blevesearch/bleve) a Go-lang based full-text indexer. I was wondering if you've considered integrating it with CockroachDB? I'ts seems like it may be a good fit, and is being used in other distributed databases.

  • It has an Apache licence.
  • Written in Go.
  • Already provides allot of value add functionality.

Thanks!
-- Nick

Maintainer note from @jordanlewis: see the following issues for our current progress on search

gz#6861

Jira issue: CRDB-6169

@petermattis
Copy link
Collaborator

@Linicks Full-text search is something we'd like to support and Bleve is on my radar, though there are no concrete plans to integrate it.

@alexander-manley
Copy link

One approach for integrating Bleve with Cockroach, and thus provide CockroachDB with text search, would be to modify hugoidx (https://github.com/blevesearch/hugoidx) to allow it to BLEVE-index the contents of a Cockroach BLOB store (...#243) pre-populated with corpus text (web page scrapes, text-doc-dumps etc...).

In addition to hugoidx, the associated Go utility "bleve-hosted" could be wrapped into the embedded UI (https://github.com/cockroachdb/cockroach/tree/master/ui) in order to pull-out and/or highlight text search results pulled from the BLOB store and displayed as an additional panel under the left side "DATABASES" UI tab.

Bleve is based on file indexes, which by default are stored in BoltDB, so that part would need to be ported over to RocksDB for full integration. For the curious, a Bleve benchmark graph with RocksDB was posted to the Bleve Twitter stream a while back.

Reference:
http://www.blevesearch.com/news/Site-Search/
http://www.blevesearch.com/videos/

@knz knz changed the title Bleve - Full Text Search Integration? sql: Bleve - Full Text Search Integration? Jul 17, 2016
@knz knz added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 17, 2016
@knz knz added this to the Later milestone Jul 17, 2016
@petermattis
Copy link
Collaborator

@alexander-manley Thanks for the notes. We'll definitely take a closer look at Bleve when considering full-text indexing.

@randyyaj
Copy link

Any updates on this?

@petermattis
Copy link
Collaborator

@randyyaj Full-text indexing is something we'd like to do, but still a ways off and not currently scheduled.

@bdarnell bdarnell changed the title sql: Bleve - Full Text Search Integration? sql: Full Text Search Oct 20, 2017
@knz knz added A-sql-pgcompat Semantic compatibility with PostgreSQL O-community Originated from the community and removed O-community-questions labels Apr 24, 2018
@knz knz added C-wishlist A wishlist feature. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) and removed C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-wishlist A wishlist feature. labels May 2, 2018
@SantoshSah
Copy link

@petermattis , any update?

@nstewart
Copy link
Contributor

nstewart commented Sep 8, 2018

Full text search is something we want to support, but it is not on the roadmap for cockroachdb 2.1 or 2.2. While we are adding some new functionality, for the next couple releases we are focusing on improving the performance and stability of our current offering before we add major new features.

@petermattis petermattis removed this from the Later milestone Oct 5, 2018
@knz knz added the X-anchored-telemetry The issue number is anchored by telemetry references. label Nov 22, 2018
@RoachietheSupportRoach
Copy link
Collaborator

Zendesk ticket #3521 has been linked to this issue.

@OldhamMade
Copy link

Does this zendesk ticket mean that full-text indexing is being actively worked on?

@jordanlewis
Copy link
Member

jordanlewis commented Oct 1, 2019 via email

craig bot pushed a commit that referenced this issue Sep 18, 2020
54565: parser: parse REINDEX SCHEMA r=arulajmani a=otan

Refs: #51424

Release note: None

54568: sql: add unimplemented errors for jsonpath types and builtins r=arulajmani a=otan

Refs #22513 , #51424

Release note: None

54573: builtins: add unimplemented errors for full text search builtins r=arulajmani a=otan

Refs: #7821, #51424 

Release note: None

54575: parser: add unimplemented errors for CREATE/DROP ACCESS METHOD r=arulajmani a=otan

No issue number for this one (same as aggregate) as there's no way I think we can
realistically support this. Telemetry still there though.

Refs: #51424

Release note: None

Co-authored-by: Oliver Tan <[email protected]>
@aranwe
Copy link

aranwe commented Oct 25, 2021

No, full text search isn't on the near term roadmap for the time being.

2 years later, any plans? :)

@alexander-manley
Copy link

alexander-manley commented Oct 25, 2021 via email

@Bessonov
Copy link

@alexander-manley

In the meantime... https://opensearch.org/

You mean this aws and other cloud provider guys who stole technology to make huge money with it? Yeah, great effort on piracy. Full disclose: I'm elastic free and on-premise user. Not affiliated in any way with elastic and sorry to see how people steal, just because it's software and not hardware.

Back to the issue.

Although it would be nice to have a full text search (fts), I don't think that it's the right way. I never saw a good built-in search, because it's very complex, very special and there are great products like elasticsearch, solr, sphinxsearch and so on, which are developed for more than 15 years. It is a huge effort.

Instead of developing a very limited fts I would propose to develop an interface to popular products. Something like zombodb (not used yet). So you can interact with the search through SQL and your data (automagically) synced with index.

The first post suggest an integration with bleve. From the first glance it would be OK, but I'm not sure how big is the gap to other products. One show stopper is synonyms.

@jezell
Copy link

jezell commented Apr 21, 2022

I think the best way to get some sort of support for full text search "out of the box" would be to CREATE CHANGEFEED to support some destinations like elasticsearch, vespa, algolia, etc. Modern full text search is a completely different domain than relational data. While I'm sure the team could eventually crack it, it would likely be a long road to get it up to par with something like Vespa. I'd personally rather see out of the box integration, as we wouldn't want to give up search result quality to switch to something built in.

@jordanlewis
Copy link
Member

CockroachDB 22.2 will support trigram indexes, a simple form of text search that may help some of your use cases. See #79705 for details on what has been added.

@amirouche
Copy link

Since I only used it for spell checking for small dictionaries, I am not sure how trigrams help to implement full-text search.

@jordanlewis
Copy link
Member

Please feel free to follow and upvote #41288, which is an issue that tracks Postgres-compatible tsvector and tsquery implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-pgcompat Semantic compatibility with PostgreSQL C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community X-anchored-telemetry The issue number is anchored by telemetry references.
Projects
None yet
Development

No branches or pull requests