Mango: generate a warning instead of an error when an user-specified index cannot be used #962

willholley · 2017-11-03T15:27:11Z

Overview

If a user specifies a value for use_index that is not valid for the current query - i.e. it does not meet the coverage requirements of the selector or sort fields - attempt to fall back to a valid index (or database scan) rather than returning a 400 error to the client.

When a fallback occurs, populate the "warning" field in the response (as we already do when a full database scan takes place) to feed back to the user that use_index was ignored.

This change is partially as mitigation for #816, which may lead to some previously valid indexes being deemed invalid, and also to make use_index less brittle for production applications in general.

Testing recommendations

Run the test suite. Run some mango queries with and without use_index. Observe that a warning is returned in the JSON and in Fauxton.

Checklist

Code is written and works correctly;
Changes are covered by tests;
Documentation reflects the changes;

tonysun83 · 2017-11-08T00:52:25Z

src/mango/src/mango_cursor.erl

-maybe_add_warning(UserFun, #idx{type = IndexType}, UserAcc) ->
-    case IndexType of
+maybe_add_warning(UserFun, #cursor{index = Index, opts = Opts}, UserAcc) ->
+


Is this whitespace intended? I don't think I've seen it right after a function declaration before. Might want to check styling guidelines. It's slightly hard to read.

tonysun83 · 2017-11-08T00:53:56Z

src/mango/src/mango_cursor.erl

+        {use_index, [DesignId]} ->
+            case filter_indexes([Index], DesignId) of
+                [] ->
+                    [fmt("_design/~s was not used because it does not contain a valid index for this query.", [ddoc_name(DesignId)])];


this line is too long

tonysun83 · 2017-11-08T00:54:10Z

src/mango/src/mango_cursor.erl

+        {use_index, [DesignId, ViewName]} ->
+            case filter_indexes([Index], DesignId, ViewName) of
+                [] ->
+                    [fmt("_design/~s, ~s was not used because it is not a valid index for this query.", [ddoc_name(DesignId), ViewName])];


super long line

tonysun83 · 2017-11-08T00:55:11Z

src/mango/src/mango_cursor.erl

+
+ddoc_name(Name) ->
+    Name.


missing endline

tonysun83 · 2017-11-08T01:00:18Z

src/mango/src/mango_error.erl

-        400,
-        <<"no_usable_index">>,
-        <<"There are no indexes defined in this database.">>
-    };
 info(mango_idx, {no_usable_index, no_index_matching_name}) ->


do we need to remove this error as well?

tonysun83 · 2017-11-08T01:04:29Z

src/mango/src/mango_idx.erl

-    end,
+    GlobalIndexes = mango_cursor:remove_indexes_with_partial_filter_selector(ExistingIndexes),
+    UserSpecifiedIndex = mango_cursor:maybe_filter_indexes_by_ddoc(ExistingIndexes, Opts),
+    UsableIndexes0 = lists:usort(GlobalIndexes ++ UserSpecifiedIndex),


why are we sorting the indexes here?

this is to remove duplicates - an index might belong to GlobalIndexes and be specified by the user explicitly.

tonysun83 · 2017-11-08T01:05:38Z

src/mango/test/05-index-selection-test.py

@@ -77,34 +73,78 @@ def test_uses_index_when_no_range_or_equals(self):
        resp_explain = self.db.find(selector, explain=True)
        self.assertEqual(resp_explain["index"]["type"], "json")

-


tonysun83 · 2017-11-08T01:25:38Z

I need to think about this some more. Falling back on _all_docs is one thing, but I'm a little concerned about switching between indexes.

Some initial thoughts:

Can we guarantee the same results between switched indexes (i.e json vs text?)
If a user doesn't read the warning message, and decides to delete the substitute index then we're forced to fall back on _all_docs. If this leads to a performance degradation, would the user want to rebuild the substitute index?

willholley · 2017-11-08T12:50:09Z

@tonysun83 Regarding your concerns about the fallback behaviour:

This is a concern in general and doesn't seem specific to this PR. Assuming users don't specify use_index, we depend on the query planner to select an appropriate index and it should be the case that the only indexes which can change the results are partial indexes. Indexes that are not usable for the query are excluded from the set of candidate indexes, so the fallback is safe so long as the text indexes correctly exclude themselves.
I think this points to a general weakness around index management for Query. Ideally it would be safe to add and remove indexes in production databases whilst only impacting performance. Unfortunately, the query planner doesn't take the index state into account, so if an index is added to a production database and it takes some time to build, the query planner would use it and time out. Again, I'm not really sure that's a concern specific to this PR, but one that should be addressed independently.

tonysun83 · 2017-11-14T02:35:52Z

Wanted to clarify a bit:

For scenario 1): I agree that fixing the difference between text and json indexes is outside the scope of the PR, but I was thinking about the specific backwards-compatibility situation where a user specifies use_index for a json index, and it fails, and the fallback is a text index. Would we rather have them possibly get different results and complain later down the road versus failing them right away with an error?

For scenario 2): Similarly to scenario 1), the issue is whether or not the fallback leads to inadvertent changes for the user. In this case, he is modifying his indexes based on perceived notion of what is being used. The performance degradation for _all_docs is noticed and he complains about the issue. We notify him saying that an index was deleted, but he says the index wasn't being used. Now he says ok he wants to rebuild an index...and that may take a long time and negatively affect his cluster.

I wonder if it's better to explicitly send back index names as part an failure error message. But could that be a security risk?

tonysun83 · 2017-11-29T06:16:25Z

src/mango/test/05-index-selection-test.py

@@ -46,7 +46,7 @@ def test_use_most_columns(self):
                "name.last": "Something or other",
                "age": {"$gt": 1}
            }, explain=True)
-        self.assertNotEqual(resp["index"]["ddoc"], "_design/" + ddocid)
+        self.assertNotEqual(resp["index"]["ddoc"], ddocid)


this looks a random testcase fix not related to the PR?

tonysun83 · 2017-11-29T06:20:14Z

src/mango/test/05-index-selection-test.py

-            raise AssertionError("did not reject bad use_index")
+
+        resp = self.db.find(selector, use_index=[ddocid,name], return_raw=True)
+        self.assertEqual(resp["warning"], "{0}, {1} was not used because it is not a valid index for this query.".format(ddocid, name))


in the test case above, you're checking that a value is returned

# should still return a correct result for d in r["docs"]: self.assertEqual(d["company"], "Pharmex")

We should be consistent?

tonysun83 · 2017-11-29T06:30:48Z

src/mango/test/05-index-selection-test.py

+        self.assertEqual(resp_explain["index"]["ddoc"], ddocid_valid)    
+
+        resp = self.db.find(selector, sort=["foo", "bar"], use_index=ddocid_invalid, return_raw=True)
+        self.assertEqual(resp["warning"], '{0} was not used because it does not contain a valid index for this query.'.format(ddocid_invalid))    


same as above, we should get a value back here right? It's going to fall back on the ddoc_valid and return a result, so just confirm a result?

agree that we should check the result in the cases above. In this test, however, I don't expect any matches (and can assert that).

tonysun83 · 2017-11-29T07:16:17Z

src/mango/src/mango_cursor.erl

+            case filter_indexes([Index], DesignId) of
+                [] ->
+                    [fmt("_design/~s was not used because it does not contain a valid index for this query.", 
+                        [ddoc_name(DesignId)])];


Should make the warning message a little bit more explicit here and add "we used index " instead?

I figured the user can use _explain to see which index was used

tonysun83 · 2017-11-29T07:16:31Z

src/mango/src/mango_cursor.erl

+            case filter_indexes([Index], DesignId, ViewName) of
+                [] ->
+                    [fmt("_design/~s, ~s was not used because it is not a valid index for this query.", 
+                        [ddoc_name(DesignId), ViewName])];


same as the above comment

tonysun83 · 2017-11-29T07:19:56Z

src/mango/src/mango_cursor.erl

+    end,
+
+    maybe_add_warning_int(UseIndexInvalid ++ NoIndex, UserFun, UserAcc).


Do we really need to append two messages and then take the head of that list later? I feel like it's either going to be a

"we used a diff index" warning or

"no indexes found, we used all docs" warning.

Instead of returning empty lists, perhaps you could just move the NoIndex value as the return value for the case clauses. That way no need to add and then take the head.

Ex:

{use_index, []} -> NoIndex;

tonysun83 · 2017-11-29T07:20:55Z

src/mango/src/mango_cursor.erl

+
+maybe_add_warning_int(Warnings, UserFun, UserAcc) ->
+    % only include the first warning


see comment about about the warning list

tonysun83 · 2017-11-30T18:46:34Z

+1

Mango will always fall back to the primary index so this check / error message is redundant.

If a user specifies a value for use_index that is not valid for the selector - i.e. it does not meet the coverage requirements of the selector or sort fields - attempt to fall back to a valid index (or database scan) rather than returning a 400 error. When a fallback occurs, populate the "warning" field in the response (as we already do when a full database scan takes place) with details of the fallback. This change is partially as mitigation for #816, which may lead to some previously valid indexes being deemed invalid, and also to make use_index less brittle in general. If an index that is used explicitly by active queries is removed, Couch will now generate warnings and there may be a performance impact, but the client will still get correct results.

tonysun83 reviewed Nov 8, 2017

View reviewed changes

src/mango/src/mango_cursor.erl

ddoc_name(Name) ->

Name.

Copy link

Contributor

tonysun83 Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing endline

tonysun83 reviewed Nov 8, 2017

View reviewed changes

tonysun83 reviewed Nov 29, 2017

View reviewed changes

willholley added 4 commits November 30, 2017 19:13

Remove "no indexes defined" mango error

ce27375

Mango will always fall back to the primary index so this check / error message is redundant.

address test comments

de33e94

address mango_cursor comments

bda54bd

willholley merged commit 743bd88 into apache:master Nov 30, 2017

willholley deleted the use_index_fallback branch November 30, 2017 19:53

wohali mentioned this pull request May 7, 2018

Different behavior "sorting" in the couchdb in version 2.1.0 and 2.1.1 #1315

Closed

pgj mentioned this pull request Apr 6, 2023

Mango queries should not use other index when use_index is specified #4511

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mango: generate a warning instead of an error when an user-specified index cannot be used #962

Mango: generate a warning instead of an error when an user-specified index cannot be used #962

willholley commented Nov 3, 2017

tonysun83 Nov 8, 2017 •

edited

Loading

tonysun83 Nov 8, 2017

tonysun83 Nov 8, 2017

tonysun83 Nov 8, 2017

tonysun83 Nov 8, 2017

tonysun83 Nov 8, 2017

willholley Nov 8, 2017

tonysun83 Nov 8, 2017

tonysun83 commented Nov 8, 2017 •

edited

Loading

willholley commented Nov 8, 2017

tonysun83 commented Nov 14, 2017

tonysun83 Nov 29, 2017

tonysun83 Nov 29, 2017 •

edited

Loading

tonysun83 Nov 29, 2017

willholley Nov 29, 2017

tonysun83 Nov 29, 2017

willholley Nov 29, 2017

tonysun83 Nov 29, 2017

tonysun83 Nov 29, 2017

tonysun83 Nov 29, 2017

tonysun83 commented Nov 30, 2017

		@@ -77,34 +73,78 @@ def test_uses_index_when_no_range_or_equals(self):
		resp_explain = self.db.find(selector, explain=True)
		self.assertEqual(resp_explain["index"]["type"], "json")

		end,

		maybe_add_warning_int(UseIndexInvalid ++ NoIndex, UserFun, UserAcc).


		maybe_add_warning_int(Warnings, UserFun, UserAcc) ->
		% only include the first warning

Mango: generate a warning instead of an error when an user-specified index cannot be used #962

Mango: generate a warning instead of an error when an user-specified index cannot be used #962

Conversation

willholley commented Nov 3, 2017

Overview

Testing recommendations

Checklist

tonysun83 Nov 8, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tonysun83 commented Nov 8, 2017 • edited Loading

willholley commented Nov 8, 2017

tonysun83 commented Nov 14, 2017

Choose a reason for hiding this comment

tonysun83 Nov 29, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tonysun83 commented Nov 30, 2017

tonysun83 Nov 8, 2017 •

edited

Loading

tonysun83 commented Nov 8, 2017 •

edited

Loading

tonysun83 Nov 29, 2017 •

edited

Loading