-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: Full Text Search #7821
Comments
@Linicks Full-text search is something we'd like to support and Bleve is on my radar, though there are no concrete plans to integrate it. |
One approach for integrating Bleve with Cockroach, and thus provide CockroachDB with text search, would be to modify hugoidx (https://github.com/blevesearch/hugoidx) to allow it to BLEVE-index the contents of a Cockroach BLOB store (...#243) pre-populated with corpus text (web page scrapes, text-doc-dumps etc...). In addition to hugoidx, the associated Go utility "bleve-hosted" could be wrapped into the embedded UI (https://github.com/cockroachdb/cockroach/tree/master/ui) in order to pull-out and/or highlight text search results pulled from the BLOB store and displayed as an additional panel under the left side "DATABASES" UI tab. Bleve is based on file indexes, which by default are stored in BoltDB, so that part would need to be ported over to RocksDB for full integration. For the curious, a Bleve benchmark graph with RocksDB was posted to the Bleve Twitter stream a while back. Reference: |
@alexander-manley Thanks for the notes. We'll definitely take a closer look at Bleve when considering full-text indexing. |
Any updates on this? |
@randyyaj Full-text indexing is something we'd like to do, but still a ways off and not currently scheduled. |
@petermattis , any update? |
Full text search is something we want to support, but it is not on the roadmap for cockroachdb 2.1 or 2.2. While we are adding some new functionality, for the next couple releases we are focusing on improving the performance and stability of our current offering before we add major new features. |
Zendesk ticket #3521 has been linked to this issue. |
Does this zendesk ticket mean that full-text indexing is being actively worked on? |
No, full text search isn't on the near term roadmap for the time being.
… |
54565: parser: parse REINDEX SCHEMA r=arulajmani a=otan Refs: #51424 Release note: None 54568: sql: add unimplemented errors for jsonpath types and builtins r=arulajmani a=otan Refs #22513 , #51424 Release note: None 54573: builtins: add unimplemented errors for full text search builtins r=arulajmani a=otan Refs: #7821, #51424 Release note: None 54575: parser: add unimplemented errors for CREATE/DROP ACCESS METHOD r=arulajmani a=otan No issue number for this one (same as aggregate) as there's no way I think we can realistically support this. Telemetry still there though. Refs: #51424 Release note: None Co-authored-by: Oliver Tan <[email protected]>
2 years later, any plans? :) |
In the meantime... https://opensearch.org/
…On Mon, Oct 25, 2021, 9:25 AM 4RW ***@***.***> wrote:
No, full text search isn't on the near term roadmap for the time being.
… <#m_-2633625202395074411_>
2 years later, any plans? :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7821 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABF6LZWEQC7YTLVO2LXUFTTUIVLELANCNFSM4CJPWD7Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
You mean this aws and other cloud provider guys who stole technology to make huge money with it? Yeah, great effort on piracy. Full disclose: I'm elastic free and on-premise user. Not affiliated in any way with elastic and sorry to see how people steal, just because it's software and not hardware. Back to the issue. Although it would be nice to have a full text search (fts), I don't think that it's the right way. I never saw a good built-in search, because it's very complex, very special and there are great products like elasticsearch, solr, sphinxsearch and so on, which are developed for more than 15 years. It is a huge effort. Instead of developing a very limited fts I would propose to develop an interface to popular products. Something like zombodb (not used yet). So you can interact with the search through SQL and your data (automagically) synced with index. The first post suggest an integration with bleve. From the first glance it would be OK, but I'm not sure how big is the gap to other products. One show stopper is synonyms. |
I think the best way to get some sort of support for full text search "out of the box" would be to CREATE CHANGEFEED to support some destinations like elasticsearch, vespa, algolia, etc. Modern full text search is a completely different domain than relational data. While I'm sure the team could eventually crack it, it would likely be a long road to get it up to par with something like Vespa. I'd personally rather see out of the box integration, as we wouldn't want to give up search result quality to switch to something built in. |
CockroachDB 22.2 will support trigram indexes, a simple form of text search that may help some of your use cases. See #79705 for details on what has been added. |
Since I only used it for spell checking for small dictionaries, I am not sure how trigrams help to implement full-text search. |
Please feel free to follow and upvote #41288, which is an issue that tracks Postgres-compatible tsvector and tsquery implementations. |
All,
I'm sure most of you know about Bleve (https://github.com/blevesearch/bleve) a Go-lang based full-text indexer. I was wondering if you've considered integrating it with CockroachDB? I'ts seems like it may be a good fit, and is being used in other distributed databases.
Thanks!
-- Nick
Maintainer note from @jordanlewis: see the following issues for our current progress on search
gz#6861
Jira issue: CRDB-6169
The text was updated successfully, but these errors were encountered: