Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand target entries with merged array/jsonb subscripting ops into multiple ones #5692

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

onurctirtir
Copy link
Member

@onurctirtir onurctirtir commented Feb 7, 2022

Fixes #5621.

DESCRIPTION: Fixes a bug that causes incorrect evaluation of UPDATE/UPSERT commands over array/jsonb columns

When re-writing query tree, postgres combines multiple subscripting
operators referencing to the same attribute into a single TargetEntry
by linking SubscriptingRef objects to each other via their refexpr
fields. (See rewriteTargetListIU & process_matched_tle functions.)

However, ruleutils function get_update_query_targetlist_def doesn't
know how to properly deparse such a TargetEntry. As a result, we were
only taking the last set-by-subscript operation into account when
generating the shard query for such an UPDATE command.

In postgres, this doesn't cause any problems (e.g.: when generating
definition of a rule based object) since the query-rewrite
transformations aren't performed on the query tree that
get_update_query_targetlist_def is expected to process.

For this reason; with this commit, before processing the target entry
list in our ruleutils based deparser, we first expand such target
entries into multiple ones.

To detect such SubscriptingRef objects, we also need to investigate
FieldStore and CoerceToDomain objects as postgres functions
processing SubscriptingRef objects do --although they do so for
different purposes. However, given that Citus already doesn't allow
INSERT/UPDATE via FieldStore, we only do that for CoerceToDomain
objects.


Reproducible (at least) for all versions >= 9.5.

@onurctirtir onurctirtir force-pushed the fix/subscript-jsonb branch 3 times, most recently from 2dc56db to 12aa54d Compare February 8, 2022 10:08
@onderkalaci
Copy link
Member

When re-writing query tree, postgres combines multiple subscripting
operators referencing to the same attribute into a single TargetEntry
by linking SubscriptingRef objects to each other via their refexpr
fields. (See rewriteTargetListIU & process_matched_tle functions.)

Don't we use originalQuery while deparsing? Why is that impacted by re-writing?

@onurctirtir
Copy link
Member Author

When re-writing query tree, postgres combines multiple subscripting
operators referencing to the same attribute into a single TargetEntry
by linking SubscriptingRef objects to each other via their refexpr
fields. (See rewriteTargetListIU & process_matched_tle functions.)

Don't we use originalQuery while deparsing? Why is that impacted by re-writing?

Yes, we use the original query tree but the rewrite transformations happen way before, meaning that the query tree that we later pass down to the executor is the query tree that postgres already applied such transformations, i.e.: parse argument passed to distributed_planner.

@onurctirtir onurctirtir force-pushed the fix/subscript-jsonb branch 2 times, most recently from f410eab to 9914df2 Compare February 8, 2022 12:36
@onurctirtir onurctirtir marked this pull request as ready for review February 8, 2022 14:10
…ultiple ones

When re-writing query tree, postgres combines multiple subscripting
operators referencing to the same attribute into a single `TargetEntry`
by linking `SubscriptingRef` objects to each other via their `refexpr`
fields. (See `rewriteTargetListIU` & `process_matched_tle` functions.)

However, ruleutils function `get_update_query_targetlist_def` doesn't
know how to properly deparse such a `TargetEntry`. As a result, we were
only taking the last set-by-subscript operation into account when
generating the shard query for such an `UPDATE` command.

In postgres, this doesn't cause any problems (e.g.: when generating
definition of a rule based object) since the query-rewrite
transformations aren't performed on the query tree that
`get_update_query_targetlist_def` is expected to process.

For this reason; with this commit, before processing the target entry
list in our ruleutils based deparser, we first expand such target
entries into multiple ones.

To detect such `SubscriptingRef` objects, we also need to investigate
`FieldStore` and `CoerceToDomain` objects as postgres functions
processing `SubscriptingRef` objects do --although they do so for
different purposes. However, given that Citus already doesn't allow
`INSERT/UPDATE` via `FieldStore`, we only do that for `CoerceToDomain`
objects.
Comment on lines +1456 to +1457
ereport(ERROR, (errmsg("unexpectedly got FieldStore object when "
"generating shard query")));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed we have a test for that now, but seems that codecov did't get updated yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ummmm, somehow we might be telling codecov to skip files under '/test/

@codecov
Copy link

codecov bot commented Feb 14, 2022

Codecov Report

Merging #5692 (108f537) into master (333bcc7) will increase coverage by 0.18%.
The diff coverage is 97.82%.

❗ Current head 108f537 differs from pull request most recent head 6fa965c. Consider uploading reports for the commit 6fa965c to get more accurate results

@@            Coverage Diff             @@
##           master    #5692      +/-   ##
==========================================
+ Coverage   92.60%   92.79%   +0.18%     
==========================================
  Files         227      223       -4     
  Lines       47376    47382       +6     
==========================================
+ Hits        43874    43968      +94     
+ Misses       3502     3414      -88     

@onurctirtir
Copy link
Member Author

Looks like after the changes from #5728 has been merged, most of the tests added in this pr started failing since we don't yet support DOMAIN objects.

@onderkalaci
Copy link
Member

Looks like after the changes from #5728 has been merged, most of the tests added in this pr started failing since we don't yet support DOMAIN objects.

that could land into Citus 11 via #5764

@onurctirtir
Copy link
Member Author

Looks like after the changes from #5728 has been merged, most of the tests added in this pr started failing since we don't yet support DOMAIN objects.

that could land into Citus 11 via #5764

Dropped DOMAIN tests, but we should re-land those tests into subscripting_op.sql after merging #5764.

https://gist.github.com/onurctirtir/d6c350a78e03d9044a2bf3c72f0f39f8

Copy link
Member

@onderkalaci onderkalaci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm mostly on-board with the changes, still want to have one more look to the approach. And, couldn't dive into the details of the code yet, but seems fine on the high-level

(1 row)

INSERT INTO nested_obj_update VALUES
(1, '{"a": [1,2,3], "b": [4,5,6], "c": [7,8,9], "d": [1,2,1,2]}', '4'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add RETURNING as well:

 RETURNING (data['a'], data['b'][0]);

@@ -1272,6 +1272,83 @@ SELECT count(*) FROM
1
(1 row)

CREATE TABLE jsonb_subscript_update (id INT, data JSONB);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like a great candidate for adding to arbitrary configs so that we ensure the results are the same with PG/?

data['a']::NUMERIC + 3 AS d
FROM jsonb_subscript_update
) updated_vals
WHERE jsonb_subscript_update.id = updated_vals.id;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I add AND jsonb_subscript_update.id = 1 we seem to fail to update properly.

UPDATE jsonb_subscript_update                                                         SET data['b'] = updated_vals.b::TEXT::jsonb,                                                                                       data['c'] = updated_vals.c::TEXT::jsonb,                                                                                       data['d'] = updated_vals.d::TEXT::jsonb                                                                                    FROM (                                                                                                                           SELECT id,                                                                                                                            data['a'] AS a,                                                                                                                data['a']::NUMERIC + 1 AS b,                                                                                                   data['a']::NUMERIC + 2 AS c,                                                                                                   data['a']::NUMERIC + 3 AS d                                                                                             FROM jsonb_subscript_update                                                                                                  ) updated_vals                                                                                                                 WHERE jsonb_subscript_update.id = updated_vals.id 
AND jsonb_subscript_update.id = 1;

┌────┬──────────────────┐
│ id │       data       │
├────┼──────────────────┤
│  1 │ {"a": 1, "d": 4} │
│  2 │ {"a": 2}         │
└────┴──────────────────┘
(2 rows)

Router queries might use a different code-path for deparsing the queries.


INSERT INTO jsonb_subscript_update VALUES (1, '{"a": 1}'), (2, '{"a": 2}');
UPDATE jsonb_subscript_update
SET data['b'] = updated_vals.b::TEXT::jsonb,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also insert one more row with 2 levels of subscription:

INSERT INTO jsonb_subscript_update VALUES (1, '{"a": {"c":20, "d" : 200}}'), (2, '{"a": {"d":10, "c" : 100}}');               

And use

UPDATE jsonb_subscript_update
SET DATA['a']['c'] = (updated_vals.d + updated_vals.a::NUMERIC)::TEXT::JSONB
FROM
  (SELECT id,
          DATA['a']['c'] AS a,
                   DATA['a']['c']::NUMERIC + 1 AS b,
                            DATA['a']['c']::NUMERIC + 2 AS c,
                                     DATA['a']['d']::NUMERIC + 3 AS d
   FROM jsonb_subscript_update) updated_vals
WHERE jsonb_subscript_update.id = updated_vals.id;



/*
* ExpandMergedSubscriptingRefEntries takes a list of target entries and expands
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets document the PG functions that merge these together

@@ -3324,6 +3324,8 @@ get_update_query_targetlist_def(Query *query, List *targetList,
SubLink *cur_ma_sublink;
List *ma_sublinks;

targetList = ExpandMergedSubscriptingRefEntries(targetList);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we call this for PG 12/13 as well. Why? Do we have the same bug for arrays on the earlier versions? Do we have tests for that?

@@ -3439,6 +3439,8 @@ get_update_query_targetlist_def(Query *query, List *targetList,
SubLink *cur_ma_sublink;
List *ma_sublinks;

targetList = ExpandMergedSubscriptingRefEntries(targetList);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe don't change the input:

targetList = ExpandMergedSubscriptingRefEntries(list_copy(targetList));

List *
list_reverse(const List *list)
{
#if PG_VERSION_NUM >= PG_VERSION_13
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use something simpler for all versions?

	List *reversedSubXactStates = NIL;

	/*
	 * activeSubXactContexts is in reversed temporal order, so we reverse it to get it
	 * in temporal order.
	 */
	SubXactContext *state = NULL;
	foreach_ptr(state, activeSubXactContexts)
	{
		reversedSubXactStates = lcons(state, reversedSubXactStates);
	}

	return reversedSubXactStates;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UPDATE statement with array/jsonb subscripting behaves incorrectly with more than one field
2 participants