-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix mango json index selection #816
Fix mango json index selection #816
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very least I think the naming is misleading on this. The fields in the index must exist in the selector, not the other way around which is how it reads.
Also I'm hesitant to modify indexable like this for this check rather than writing the similar yet subtly different check. This definitely isn't an obvious rule but it seems odd to overload the meaning of "indexable" with "covers index field". For instance, the $exists:false call out seems out of place.
Also the maybe return value for unknown operators I'm also not 100% on. It seems like having a new covered index thing which explicitly lists all operators that can operate on the index would be a better approach.
@davisp thanks for the review! I'll have a think about the naming and overloaded function issues. To your last point, my thought there is that all predicates except, by definition, |
@davisp I went with your second suggestion and implemented a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really good work on this. Requested changes here are purely stylistic, once those are addressed then +1 to merging this.
src/mango/src/mango_idx_view.erl
Outdated
% restricted by the selector such that they are required to exist | ||
% and the selector is not a text search (so requires a text index) | ||
RequiredFields = columns(Idx), | ||
mango_selector:has_required_fields(Selector, RequiredFields) andalso not is_text_search(Selector). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line too long. Break before andalso and indent it twice.
src/mango/src/mango_selector.erl
Outdated
@@ -566,3 +567,111 @@ match({[{Field, Cond}]}, Value, Cmp) -> | |||
|
|||
match({[_, _ | _] = _Props} = Sel, _Value, _Cmp) -> | |||
erlang:error({unnormalized_selector, Sel}). | |||
|
|||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One too many blank lines.
src/mango/src/mango_selector.erl
Outdated
% until we match then all or run out of selector to | ||
% match against. | ||
|
||
% empty selector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize empty
src/mango/src/mango_selector.erl
Outdated
has_required_fields(Selector, RequiredFields) when not is_list(Selector) -> | ||
has_required_fields([Selector], RequiredFields); | ||
|
||
% "see" through $and operator. We ignore other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first sentence isn't a sentence. "We can" as a prefix would be good enough.
src/mango/src/mango_selector.erl
Outdated
% "see" through $and operator. We ignore other | ||
% combination operators because they can't be used to restrict | ||
% an index. | ||
has_required_fields([{[{<<"$and">>, Args}]}], RequiredFields) when is_list(Args) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line too long. Break before when and indent it twice.
src/mango/src/mango_selector.erl
Outdated
end; | ||
|
||
% no more required fields | ||
has_required_fields(_, []) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its general practice to place your recursion terminating conditions as the first function clauses.
src/mango/src/mango_selector.erl
Outdated
has_required_fields(_, []) -> | ||
true; | ||
|
||
% no more selector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize No and move to before the recursive call.
%%%%%%%% module tests below %%%%%%%% | ||
|
||
-ifdef(TEST). | ||
-include_lib("eunit/include/eunit.hrl"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good test cases!
@@ -14,7 +14,6 @@ | |||
import user_docs | |||
import unittest | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't include white space only changes in PRs that aren't stylistic changes.
resp_explain = self.db.find(selector, explain=True) | ||
self.assertEqual(resp_explain["index"]["type"], "special") | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too many blank lines between class methods here and in a few places below.
JSON index selection in Mango previously deemed an index to be usable if a range scan on the first component of its compound key could be generated from the query selector. For instance, if I had an index: [A, B] is_usable would return true for the selector: {"A": "foo"} This is incorrect because JSON indexes only index documents that contain all the fields in the index; null values are ok, but the field must be present. That means that for the above selector, the index would implicitly include only documents where B exists, missing documents where {"A":5} matched but field B was not present. This commit changes is_usable so that it only returns true if all the keys in the index are required to exist by the selector. This means that in the worst case e.g. none of the predicates can be used to generate a range query, we should end up performing a full index scan, but this is still more efficient than a full database scan. We leave the generation of the optimal range for a given index as a separate exercise - currently this happens after index selection. Potentially we'd want to score indexes during index selection based on their ability to restrict the result set, etc.
It seems safe to assume that if a user specifies that results should be sorted by a field, that field needs to exist (but could be null) in the results returned. In #816 we can only use an index if all its columns are required to exist in the selector, so this improves compatibility with the old behaviour by allowing sort fields to be included in the index coverage check for JSON indexes.
* Split out text index selection tests * Skip operator tests that do not apply to text indexes * Only run array length tests against text indexes * Fix index crud tests when text indexes available * Use environment variable to switch on text tests * Fix incorrect text sort assertion in test * Always use -test in fixture filename * Fix index selection test compatibility with #816. * Improve test README
JSON index selection in Mango previously deemed an index to be usable if a range scan on the first component of its compound key could be generated from the query selector. For instance, if I had an index: [A, B] is_usable would return true for the selector: {"A": "foo"} This is incorrect because JSON indexes only index documents that contain all the fields in the index; null values are ok, but the field must be present. That means that for the above selector, the index would implicitly include only documents where B exists, missing documents where {"A":5} matched but field B was not present. This commit changes is_usable so that it only returns true if all the keys in the index are required to exist by the selector. This means that in the worst case e.g. none of the predicates can be used to generate a range query, we should end up performing a full index scan, but this is still more efficient than a full database scan. We leave the generation of the optimal range for a given index as a separate exercise - currently this happens after index selection. Potentially we'd want to score indexes during index selection based on their ability to restrict the result set, etc.
It seems safe to assume that if a user specifies that results should be sorted by a field, that field needs to exist (but could be null) in the results returned. In #816 we can only use an index if all its columns are required to exist in the selector, so this improves compatibility with the old behaviour by allowing sort fields to be included in the index coverage check for JSON indexes.
* Split out text index selection tests * Skip operator tests that do not apply to text indexes * Only run array length tests against text indexes * Fix index crud tests when text indexes available * Use environment variable to switch on text tests * Fix incorrect text sort assertion in test * Always use -test in fixture filename * Fix index selection test compatibility with #816. * Improve test README
This reverts commit dae81be. Conflicts: src/mango/test/05-index-selection-test.py
This reverts commit dae81be. Conflicts: src/mango/test/05-index-selection-test.py
Since apache#816, mango JSON index on compound fields can be selected only if the selector make sure that all the fields listed in the index are always present. This commit add a special case where all clauses of an `$or` can ensure that a field is present. For instance, if I had an index: [A, B] is_usable would now return true for the selector: { "A": "foo", "$or": { "B": "bar", "B": "baz" } }
If a user specifies a value for use_index that is not valid for the selector - i.e. it does not meet the coverage requirements of the selector or sort fields - attempt to fall back to a valid index (or database scan) rather than returning a 400 error. When a fallback occurs, populate the "warning" field in the response (as we already do when a full database scan takes place) with details of the fallback. This change is partially as mitigation for #816, which may lead to some previously valid indexes being deemed invalid, and also to make use_index less brittle in general. If an index that is used explicitly by active queries is removed, Couch will now generate warnings and there may be a performance impact, but the client will still get correct results.
Since #816, mango JSON index on compound fields can be selected only if the selector make sure that all the fields listed in the index are always present. This adds a special case where all clauses of an `$or` can ensure that a field is present. For instance, if I had an index: [A, B] is_usable would now return true for the selector: { "A": "foo", "$or": { "B": "bar", "B": "baz" } } but false for: { "A": "foo", "$or": { "B": "bar", "C": "bar" } }
Overview
JSON index selection in Mango previously deemed an index to be usable if a range scan on the first component of its compound key could be generated from the query selector.
For instance, if I had an index:
[A, B]
is_usable would return true for the selector:
{"A": "foo"}
This is incorrect because JSON indexes only index documents that contain all the fields in the index; null values are ok, but the field must be present. That means that for the above selector, the index would implicitly include only documents where B exists, missing documents where {"A":5} matched but field B was not present.
This commit changes is_usable so that it only returns true if all the keys in the index are required to exist by the selector. This means that in the worst case e.g. none of the predicates can be used to generate a range query, we should end up performing a full index scan, but this is still more efficient
than a full database scan.
We leave the generation of the optimal range for a given index as a separate exercise - currently this happens after index selection.
Potentially we'd want to score indexes during index selection based on their ability to restrict the result set, etc.
Testing recommendations
Run the mango test suite. Manually run some mango queries with/without JSON indexes and check that the results are as expected.
Checklist