-
Notifications
You must be signed in to change notification settings - Fork 340
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added support for exists query, as defined in Elasticsearch
Field exists does not consider types, only field names. Field capability will have to be handled differently unfortunately. This works by introducing an internal (but normal) "u64" field that stores postings list for field existence. For performance/RAM reasons, the fields full path is not stored as a string but instead we compute a u64-fnv hash using the path from root to leaf. If the hash perfects ideally, even with the anniversary attach, collisions are very unlikely. When dealing with complex JSON with the raw tokenizer this feature can double the number of tokens we deal with, and has an impact on performance. For this reason, it is not added as an option in the DocMapper. Like Elasticsearch, we only store field existence of indexed fields. Also in order to handle refinement like expand_dots, we work over the built tantivy Document and reuse the existing resolution logic. On 1.4GB of gharchive (which is close to a worst case scenaio), see the following performance/index size change: With field_exists enabled - Indexing Throughput: 41 MB/s - Index size: 701M With field_exists disabled - Indexing Throughput: 46 MB/s - Index size: 698M
- Loading branch information
1 parent
8c2caf5
commit 8a59e05
Showing
29 changed files
with
519 additions
and
36 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
// Copyright (C) 2023 Quickwit, Inc. | ||
// | ||
// Quickwit is offered under the AGPL v3.0 and as commercial software. | ||
// For commercial licensing, contact us at [email protected]. | ||
// | ||
// AGPL: | ||
// This program is free software: you can redistribute it and/or modify | ||
// it under the terms of the GNU Affero General Public License as | ||
// published by the Free Software Foundation, either version 3 of the | ||
// License, or (at your option) any later version. | ||
// | ||
// This program is distributed in the hope that it will be useful, | ||
// but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
// GNU Affero General Public License for more details. | ||
// | ||
// You should have received a copy of the GNU Affero General Public License | ||
// along with this program. If not, see <http://www.gnu.org/licenses/>. | ||
|
||
use std::hash::Hasher; | ||
|
||
/// Mini wrapper over the FnvHasher to incrementally hash nodes | ||
/// in a tree. | ||
/// | ||
/// The wrapper does not do too much. Its main purpose to | ||
/// work around the lack of Clone in the fnv Hasher | ||
/// and enforce a 0 byte separator between segments. | ||
#[derive(Default)] | ||
pub struct PathHasher { | ||
hasher: fnv::FnvHasher, | ||
} | ||
|
||
impl Clone for PathHasher { | ||
#[inline(always)] | ||
fn clone(&self) -> PathHasher { | ||
PathHasher { | ||
hasher: fnv::FnvHasher::with_key(self.hasher.finish()), | ||
} | ||
} | ||
} | ||
|
||
impl PathHasher { | ||
/// Helper function, mostly for tests. | ||
pub fn hash_path(segments: &[&[u8]]) -> u64 { | ||
let mut hasher = Self::default(); | ||
for segment in segments { | ||
hasher.append(segment); | ||
} | ||
hasher.finish() | ||
} | ||
|
||
/// Appends a new segment to our path. | ||
/// | ||
/// In order to avoid natural collisions, (e.g. &["ab", "c"] and &["a", "bc"]), | ||
/// we add a null byte between each segment as a separator. | ||
#[inline] | ||
pub fn append(&mut self, payload: &[u8]) { | ||
self.hasher.write(payload); | ||
// We use 255 as a separator as all utf8 bytes contain a 0 | ||
// in position 0-5. | ||
self.hasher.write(&[255u8]); | ||
} | ||
|
||
#[inline] | ||
pub fn finish(&self) -> u64 { | ||
self.hasher.finish() | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
// Copyright (C) 2023 Quickwit, Inc. | ||
// | ||
// Quickwit is offered under the AGPL v3.0 and as commercial software. | ||
// For commercial licensing, contact us at [email protected]. | ||
// | ||
// AGPL: | ||
// This program is free software: you can redistribute it and/or modify | ||
// it under the terms of the GNU Affero General Public License as | ||
// published by the Free Software Foundation, either version 3 of the | ||
// License, or (at your option) any later version. | ||
// | ||
// This program is distributed in the hope that it will be useful, | ||
// but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
// GNU Affero General Public License for more details. | ||
// | ||
// You should have received a copy of the GNU Affero General Public License | ||
// along with this program. If not, see <http://www.gnu.org/licenses/>. | ||
|
||
/// Field name reserved for storing the dynamically indexed fields. | ||
pub const FIELD_PRESENCE_FIELD_NAME: &str = "_field_presence"; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.