-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vespa schema changes for query control & general quality of life #163
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! Left some comments below. I'm looking forward to seeing how we get on with nativerank!
@@ -5,6 +5,15 @@ schema document_passage { | |||
stemming: none | |||
} | |||
|
|||
field language type string { | |||
indexing: "en" | set_language |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the docs for this:
The recommended use is to have one field in the document containing the language code, and that field should be the first field in the document, as it will only affect the fields defined after it in the schema.
https://docs.vespa.ai/en/reference/indexing-language-reference.html#set_language
This feels weird to me to have a document config at a field level, but it is what it is! I'm wondering if we need to move the language field above text_block_not_stemmed
?
summary text_block_page {} | ||
summary text_block_coords {} | ||
summary concepts {} | ||
document-summary search_summary_with_tokens inherits search_summary { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh this is nice!
field language type string { | ||
indexing: "en" | set_language | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As on the document passage, wondering if this needs to be at the top of the doc
@@ -173,16 +173,6 @@ schema document_passage { | |||
tokens | |||
} | |||
} | |||
|
|||
rank-profile exact inherits default { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleaning this up now we use exact_not_stemmed instead 🎉
} | ||
function name_score() { | ||
expression: attribute(name_weight) * bm25(family_name_index) | ||
query(description_closeness_weight) double: 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting my head around this, does setting this to 0.0 make it have no effect?
bolding: true | ||
} | ||
|
||
field family_description_bolding type string { | ||
indexing: input family_description_index | index | ||
indexing: input family_description_index | summary | index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to hang onto the index attribute for these when they come from an index field?
Description
A batch of changes to the schema making use of inheritance, input parameters and summaries. These aim to mean we can have more control over search without requiring schema changes in future. Recommended that the easiest way to go through this PR is by commit.
Also:
_bolding
suffixes which give the bolded version of each when a search is doneSidenote: I tried to test this on the backend using the test pypi published package from this PR's CI but the Vespa dependency seemed to be broken on that. Not sure whether this is just me or it's actually broken 🤷
Proposed version
Please select the option below that is most relevant from the list below. This
will be used to generate the next tag version name during auto-tagging.
Visit the Semver website to understand the
difference between
MAJOR
,MINOR
, andPATCH
versions.Notes:
used -- e.g. Major > Minor > Patch
sure your selected option is marked
[x]
with no spaces in between thebrackets and the
x
Type of change
Please select the option(s) below that are most relevant:
How Has This Been Tested?
Please describe the tests that you added to verify your changes.
Before submitting
section of the
CONTRIBUTING
docs.Writing docstrings section of the
CONTRIBUTING
docs.