Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DISTINCT doesn't work with jsonb_build_object #46709

Closed
Bessonov opened this issue Mar 28, 2020 · 2 comments · Fixed by #46711
Closed

DISTINCT doesn't work with jsonb_build_object #46709

Bessonov opened this issue Mar 28, 2020 · 2 comments · Fixed by #46711
Assignees

Comments

@Bessonov
Copy link

Bessonov commented Mar 28, 2020

Describe the problem

Goal:
We have a relation from an user to postal addresses and phones. The goal is to get whole net for the user (1 or <=100) as json. We distinct between projection and filtering. If we filter by phone (filtering) or want to show phones (projection), left join clause is "automatically" added.

Because of join and projection we have a problem, if user has for example 2 phones and 1 postal address. We get the same postal two times. It's possible to get rid of this problem with postgres, but I don't found a solution with cockroachdb. See reproduction.

To Reproduce

Data snippet for postgres:

CREATE TABLE profile (id UUID NOT NULL, name TEXT NOT NULL, CONSTRAINT "primary1" PRIMARY KEY (id));
CREATE TABLE phone (id UUID NOT NULL, number TEXT NOT NULL, userid UUID NOT NULL, CONSTRAINT "primary2" PRIMARY KEY (id), CONSTRAINT phone_userid_foreign FOREIGN KEY (userid) REFERENCES profile(id));
CREATE TABLE postal (id UUID NOT NULL, streetname TEXT NOT NULL, userid UUID NOT NULL, CONSTRAINT "primary3" PRIMARY KEY (id), CONSTRAINT postal_userid_foreign FOREIGN KEY (userid) REFERENCES profile(id));

INSERT INTO profile VALUES ('00000000-0000-0000-0000-000000000000', 'Tester');
INSERT INTO phone VALUES ('11111111-1111-1111-1111-111111111111', '11111', '00000000-0000-0000-0000-000000000000');
INSERT INTO phone VALUES ('22222222-2222-2222-2222-222222222222', '22222', '00000000-0000-0000-0000-000000000000');
INSERT INTO postal VALUES ('33333333-3333-3333-3333-333333333333', '33333', '00000000-0000-0000-0000-000000000000');

Data snippet for crdb:

CREATE TABLE profile (id UUID NOT NULL, name STRING NOT NULL, CONSTRAINT "primary" PRIMARY KEY (id ASC));
CREATE TABLE phone (id UUID NOT NULL, number STRING NOT NULL, userid UUID NOT NULL, CONSTRAINT "primary" PRIMARY KEY (id ASC), CONSTRAINT phone_userid_foreign FOREIGN KEY (userid) REFERENCES profile(id));
CREATE TABLE postal (id UUID NOT NULL, streetname STRING NOT NULL, userid UUID NOT NULL, CONSTRAINT "primary" PRIMARY KEY (id ASC), CONSTRAINT postal_userid_foreign FOREIGN KEY (userid) REFERENCES profile(id));

INSERT INTO profile VALUES ('00000000-0000-0000-0000-000000000000', 'Tester');
INSERT INTO phone VALUES ('11111111-1111-1111-1111-111111111111', '11111', '00000000-0000-0000-0000-000000000000');
INSERT INTO phone VALUES ('22222222-2222-2222-2222-222222222222', '22222', '00000000-0000-0000-0000-000000000000');
INSERT INTO postal VALUES ('33333333-3333-3333-3333-333333333333', '33333', '00000000-0000-0000-0000-000000000000');

This query shows the problem on both dbs (filtering omitted, because not relevant for the output):

select "profile"."id", "profile"."name", COUNT(*) OVER() as "count", 
CASE count("postal".id) WHEN 0
        THEN NULL
        ELSE jsonb_agg(jsonb_build_object('id', "postal"."id",'userid', "postal"."userid",'streetname', "postal"."streetname"))
END as "contactpostal", 
CASE count("phone".id) WHEN 0
        THEN NULL
        ELSE jsonb_agg(jsonb_build_object('id', "phone"."id",'userid', "phone"."userid",'number', "phone"."number"))
END as "contactphone"
from "profile"
left join "postal" on "profile"."id" = "postal"."userid" -- and clause for postal's filtering/permissions
left join "phone" on "profile"."id" = "phone"."userid" -- and clause for phone's filtering/permissions
-- clause for profile's filtering/permissions
group by "profile"."id", "profile"."name";

Result:

id            | 00000000-0000-0000-0000-000000000000
name          | Tester
count         | 1
contactpostal | [{"id": "33333333-3333-3333-3333-333333333333", "streetname": "33333", "userid": "00000000-0000-0000-0000-000000000000"}, {"id": "33333333-3333-3333-3333-333333333333", "streetname": "33333", "userid": "00000000-0000-0000-0000-000000000000"}]
contactphone  | [{"id": "11111111-1111-1111-1111-111111111111", "number": "11111", "userid": "00000000-0000-0000-0000-000000000000"}, {"id": "22222222-2222-2222-2222-222222222222", "number": "22222", "userid": "00000000-0000-0000-0000-000000000000"}]

The closest workaround/solution is to use DISTINCT before jsonb_build_object, and it works on postgres (see sqlfiddle to play with it):

select "profile"."id", "profile"."name", COUNT(*) OVER() as "count", 
CASE count("postal".id) WHEN 0
        THEN NULL
        ELSE jsonb_agg(DISTINCT jsonb_build_object('id', "postal"."id",'userid', "postal"."userid",'streetname', "postal"."streetname"))
END as "contactpostal", 
CASE count("phone".id) WHEN 0
        THEN NULL
        ELSE jsonb_agg(DISTINCT jsonb_build_object('id', "phone"."id",'userid', "phone"."userid",'number', "phone"."number"))
END as "contactphone"
from "profile"
left join "postal" on "profile"."id" = "postal"."userid" -- and clause for postal's filtering/permissions
left join "phone" on "profile"."id" = "phone"."userid" -- and clause for phone's filtering/permissions
-- clause for profile's filtering/permissions
group by "profile"."id", "profile"."name";

but not on crdb:

ERROR: unable to encode table key: *tree.DJSON

I've seen that something similar was fixed by @rohany , but even after upgrade from 19.2.0 to unstable:

version | CockroachDB CCL v20.1.0-beta.3 (x86_64-unknown-linux-gnu, built 2020/03/23 17:54:39, go1.13.5)

nothing changes.

Expected behavior

I'm not sure if there is an efficient way to workaround it. It can be achieved if instead of usage of postal/phone in the select, a subselect is fired:

select distinct "profile"."id", "profile"."name", COUNT(*) OVER() as "count",
(
    select CASE count("postal".id) WHEN 0
        THEN NULL
        ELSE jsonb_agg(json_build_object('id', "postal"."id",'userid', "postal"."userid",'streetname', "postal"."streetname"))
        END as "contactpostal" FROM "postal" WHERE "postal"."userid" = "profile"."id" -- and clause for postal's filtering/permissions
),
(
    select CASE count("phone".id) WHEN 0
        THEN NULL
        ELSE jsonb_agg(json_build_object('id', "phone"."id",'userid', "phone"."userid",'number', "phone"."number"))
        END as "contactphone" FROM "phone" WHERE "phone"."userid" = "profile"."id"  -- and clause for phone's filtering/permissions
)
from "profile"
left join "postal" on "profile"."id" = "postal"."userid"  -- and clause for postals's filtering/permissions
left join "phone" on "profile"."id" = "phone"."userid"  -- and clause for phone's filtering/permissions
group by "profile"."id", "profile"."name", "phone"."id";

It doesn't feel right and breaks count:

id            | 00000000-0000-0000-0000-000000000000
name          | Tester
count         | 2
contactpostal | [{"id": "33333333-3333-3333-3333-333333333333", "streetname": "33333", "userid": "00000000-0000-0000-0000-000000000000"}]
contactphone  | [{"id": "11111111-1111-1111-1111-111111111111", "number": "11111", "userid": "00000000-0000-0000-0000-000000000000"}, {"id": "22222222-2222-2222-2222-222222222222", "number": "22222", "userid": "00000000-0000-0000-0000-000000000000"}]

May be there is another solution. But it would be nice to have a little bit more alignment to postgres.

Additional data / screenshots

Environment:

  • CockroachDB version 19.2.0, v20.1.0-beta.3
  • Server OS: official and unstable docker image
  • Client app: cockroach sql

Thanks!

@rohany rohany self-assigned this Mar 28, 2020
rohany added a commit to rohany/cockroach that referenced this issue Mar 28, 2020
Fixes cockroachdb#46709.

A recent change was made to fix encodings to perform aggregations
on JSONB and ARRAY columns. That change missed a case in for
DISTINCT aggregations, causing a runtime error when executing
these queries.

Release justification: fixes a bug

Release note (bug fix): This PR fixes a bug with distinct aggregations
on JSONB columns.
@rohany
Copy link
Contributor

rohany commented Mar 28, 2020

Hi @Bessonov, thanks for pointing this out! This is actually related to the change that you linked here, and it seems like I missed a case in the original fix. I've put up a fix, and it should be in v20.1.

craig bot pushed a commit that referenced this issue Mar 29, 2020
46711: rowexec: fix bug with distinct aggregations on JSONB columns r=yuzefovich a=rohany

Fixes #46709.

A recent change (#45229) was made to fix encodings to perform aggregations
on JSONB and ARRAY columns. That change missed a case in for
DISTINCT aggregations, causing a runtime error when executing
these queries.

Release justification: fixes a bug

Release note (bug fix): This PR fixes a bug with distinct aggregations
on JSONB columns.

Co-authored-by: Rohan Yadav <[email protected]>
@craig craig bot closed this as completed in 5849049 Mar 29, 2020
@Bessonov
Copy link
Author

@rohany thank you very much for very quick resolution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants