Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-17022: Support for glob patterns for fields in Export handler, Stream handler and with SelectStream streaming expression #1996
SOLR-17022: Support for glob patterns for fields in Export handler, Stream handler and with SelectStream streaming expression #1996
Changes from 4 commits
b210b1e
8a04a11
fe66c1f
23626fb
ac25ebb
bfe31fa
fad2f18
0d862cf
f1c62f9
0382445
19e97ae
7f7b49d
91133ae
c1ebac1
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be many fieldInfo, and you're looping over a "matches" call that is going to internally build a regex each time. Maybe you should first do the hasDocValues etc. checks so we do this matches check last?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line might add to fieldsProcessed, yet exclude the field because doesn't "hasDocValues". This looks suspicious to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanted to highlight this because @magibney and I were discussing this previously. The way this is implemented for the Export handler here actually deviates from how glob patterns work for the Select handler.
The Select handler will not match any fields where
useDocValuesAsStored=false
for glob patterns, those fields must be explicitly provided in the field list.For the purposes of Export I did not follow that pattern and instead will match any fields that have docvalues enabled with the exception of SortableTextField which require
useDocValuesAsStored=true
to be used in Export in other places in the code.My reasoning here is that there is a performance hit for getting non-stored, docvalues enabled fields in the Select handler that doesn't seem to be to be as impactful in the Export handler.
Open to other opinions on this topic, should we:
useDocValuesAsStored=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote for consistency if possible, since I think of both /select and /export as the same, just one is faster than the other ;-). If that isn't possible, then maybe if the ref guide makes it really clear "This is why you use /select and ramifications" and "THis is why you use /export and it's ramifications", then that might cover these differences. I can see the bug reports if we aren't clear ;-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @justinrsweeney foreshadowed, I am in favor of parity in how this is handled between select and export.
The purpose of udvas as I understand it is not simply to mitigate performance issues in select (and in fact, performance should be comparable between select and export as of #1938), but also as a config option to allow the configuration of docValues for a field without requiring that those values be returned when a field matches a glob pattern.
There are plenty of use cases where, e.g., a derivative string field is configured solely for the purpose of sort, faceting, [or other purpose that would make even less sense to return on export]. It should be possible to configure a field that's intended for one of the many other purposes docValues serve, without forcing it to be returned when the field name matches a glob pattern. Note also, if both
stored=true
anduseDocValuesAsStored=true
, stored fields will be used for select and docValues will be used for export.I think what I'm missing is: how does this help usability? Are there cases where one would want docValues to be used for field value retrieval, but could not simply set that field to
useDocValuesAsStored=true
? And udvas defaults to true anyway (for all fields except SortableTextField) iirc.I'm curious to hear others' thoughts on this as well!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this PR to mimic the same behavior as the select handler, where
useDocValuesAsStored=false
fields will not be returned unless specifically requested. I'm in agreement that consistency here is better.