-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand target entries with merged array/jsonb subscripting ops into multiple ones #5692
base: main
Are you sure you want to change the base?
Conversation
2dc56db
to
12aa54d
Compare
Don't we use |
Yes, we use the original query tree but the rewrite transformations happen way before, meaning that the query tree that we later pass down to the executor is the query tree that postgres already applied such transformations, i.e.: |
f410eab
to
9914df2
Compare
…ultiple ones When re-writing query tree, postgres combines multiple subscripting operators referencing to the same attribute into a single `TargetEntry` by linking `SubscriptingRef` objects to each other via their `refexpr` fields. (See `rewriteTargetListIU` & `process_matched_tle` functions.) However, ruleutils function `get_update_query_targetlist_def` doesn't know how to properly deparse such a `TargetEntry`. As a result, we were only taking the last set-by-subscript operation into account when generating the shard query for such an `UPDATE` command. In postgres, this doesn't cause any problems (e.g.: when generating definition of a rule based object) since the query-rewrite transformations aren't performed on the query tree that `get_update_query_targetlist_def` is expected to process. For this reason; with this commit, before processing the target entry list in our ruleutils based deparser, we first expand such target entries into multiple ones. To detect such `SubscriptingRef` objects, we also need to investigate `FieldStore` and `CoerceToDomain` objects as postgres functions processing `SubscriptingRef` objects do --although they do so for different purposes. However, given that Citus already doesn't allow `INSERT/UPDATE` via `FieldStore`, we only do that for `CoerceToDomain` objects.
a2c4c06
to
dc29396
Compare
ereport(ERROR, (errmsg("unexpectedly got FieldStore object when " | ||
"generating shard query"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed we have a test for that now, but seems that codecov did't get updated yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ummmm, somehow we might be telling codecov to skip files under '/test/
Codecov Report
@@ Coverage Diff @@
## master #5692 +/- ##
==========================================
+ Coverage 92.60% 92.79% +0.18%
==========================================
Files 227 223 -4
Lines 47376 47382 +6
==========================================
+ Hits 43874 43968 +94
+ Misses 3502 3414 -88 |
Looks like after the changes from #5728 has been merged, most of the tests added in this pr started failing since we don't yet support DOMAIN objects. |
Dropped DOMAIN tests, but we should re-land those tests into https://gist.github.com/onurctirtir/d6c350a78e03d9044a2bf3c72f0f39f8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm mostly on-board with the changes, still want to have one more look to the approach. And, couldn't dive into the details of the code yet, but seems fine on the high-level
(1 row) | ||
|
||
INSERT INTO nested_obj_update VALUES | ||
(1, '{"a": [1,2,3], "b": [4,5,6], "c": [7,8,9], "d": [1,2,1,2]}', '4'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add RETURNING
as well:
RETURNING (data['a'], data['b'][0]);
@@ -1272,6 +1272,83 @@ SELECT count(*) FROM | |||
1 | |||
(1 row) | |||
|
|||
CREATE TABLE jsonb_subscript_update (id INT, data JSONB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like a great candidate for adding to arbitrary configs so that we ensure the results are the same with PG/?
data['a']::NUMERIC + 3 AS d | ||
FROM jsonb_subscript_update | ||
) updated_vals | ||
WHERE jsonb_subscript_update.id = updated_vals.id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when I add AND jsonb_subscript_update.id = 1
we seem to fail to update properly.
UPDATE jsonb_subscript_update SET data['b'] = updated_vals.b::TEXT::jsonb, data['c'] = updated_vals.c::TEXT::jsonb, data['d'] = updated_vals.d::TEXT::jsonb FROM ( SELECT id, data['a'] AS a, data['a']::NUMERIC + 1 AS b, data['a']::NUMERIC + 2 AS c, data['a']::NUMERIC + 3 AS d FROM jsonb_subscript_update ) updated_vals WHERE jsonb_subscript_update.id = updated_vals.id
AND jsonb_subscript_update.id = 1;
┌────┬──────────────────┐
│ id │ data │
├────┼──────────────────┤
│ 1 │ {"a": 1, "d": 4} │
│ 2 │ {"a": 2} │
└────┴──────────────────┘
(2 rows)
Router queries might use a different code-path for deparsing the queries.
|
||
INSERT INTO jsonb_subscript_update VALUES (1, '{"a": 1}'), (2, '{"a": 2}'); | ||
UPDATE jsonb_subscript_update | ||
SET data['b'] = updated_vals.b::TEXT::jsonb, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe also insert one more row with 2 levels of subscription:
INSERT INTO jsonb_subscript_update VALUES (1, '{"a": {"c":20, "d" : 200}}'), (2, '{"a": {"d":10, "c" : 100}}');
And use
UPDATE jsonb_subscript_update
SET DATA['a']['c'] = (updated_vals.d + updated_vals.a::NUMERIC)::TEXT::JSONB
FROM
(SELECT id,
DATA['a']['c'] AS a,
DATA['a']['c']::NUMERIC + 1 AS b,
DATA['a']['c']::NUMERIC + 2 AS c,
DATA['a']['d']::NUMERIC + 3 AS d
FROM jsonb_subscript_update) updated_vals
WHERE jsonb_subscript_update.id = updated_vals.id;
|
||
|
||
/* | ||
* ExpandMergedSubscriptingRefEntries takes a list of target entries and expands |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets document the PG functions that merge these together
@@ -3324,6 +3324,8 @@ get_update_query_targetlist_def(Query *query, List *targetList, | |||
SubLink *cur_ma_sublink; | |||
List *ma_sublinks; | |||
|
|||
targetList = ExpandMergedSubscriptingRefEntries(targetList); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we call this for PG 12/13 as well. Why? Do we have the same bug for arrays on the earlier versions? Do we have tests for that?
@@ -3439,6 +3439,8 @@ get_update_query_targetlist_def(Query *query, List *targetList, | |||
SubLink *cur_ma_sublink; | |||
List *ma_sublinks; | |||
|
|||
targetList = ExpandMergedSubscriptingRefEntries(targetList); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe don't change the input:
targetList = ExpandMergedSubscriptingRefEntries(list_copy(targetList));
List * | ||
list_reverse(const List *list) | ||
{ | ||
#if PG_VERSION_NUM >= PG_VERSION_13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use something simpler for all versions?
List *reversedSubXactStates = NIL;
/*
* activeSubXactContexts is in reversed temporal order, so we reverse it to get it
* in temporal order.
*/
SubXactContext *state = NULL;
foreach_ptr(state, activeSubXactContexts)
{
reversedSubXactStates = lcons(state, reversedSubXactStates);
}
return reversedSubXactStates;
Fixes #5621.
DESCRIPTION: Fixes a bug that causes incorrect evaluation of UPDATE/UPSERT commands over array/jsonb columns
When re-writing query tree, postgres combines multiple subscripting
operators referencing to the same attribute into a single
TargetEntry
by linking
SubscriptingRef
objects to each other via theirrefexpr
fields. (See
rewriteTargetListIU
&process_matched_tle
functions.)However, ruleutils function
get_update_query_targetlist_def
doesn'tknow how to properly deparse such a
TargetEntry
. As a result, we wereonly taking the last set-by-subscript operation into account when
generating the shard query for such an
UPDATE
command.In postgres, this doesn't cause any problems (e.g.: when generating
definition of a rule based object) since the query-rewrite
transformations aren't performed on the query tree that
get_update_query_targetlist_def
is expected to process.For this reason; with this commit, before processing the target entry
list in our ruleutils based deparser, we first expand such target
entries into multiple ones.
To detect such
SubscriptingRef
objects, we also need to investigateFieldStore
andCoerceToDomain
objects as postgres functionsprocessing
SubscriptingRef
objects do --although they do so fordifferent purposes. However, given that Citus already doesn't allow
INSERT/UPDATE
viaFieldStore
, we only do that forCoerceToDomain
objects.
Reproducible (at least) for all versions >= 9.5.