[DISCUSS] How do we define operation of unique in nGQL? #548

CPWstatic · 2019-06-26T09:40:34Z

If we seach the graph by using go sentence, we might get the same record in result. Sometimes we might need make the result uniqueness.
How do we define the difference between unique and group by? Can we just treat them as the same?

Subtask of #492

The text was updated successfully, but these errors were encountered:

ayyt · 2019-06-26T09:47:14Z

Regarding the understanding of sql, there are the following points:

The unique here is distinct in sql
in sql oder by, group by, distinct are operators
oder by is sort, group by is grouping, distinct is to remove duplicate values

sherman-the-tank · 2019-06-30T03:15:32Z

Please DO NOT use Chinese punctuation

sherman-the-tank · 2019-06-30T03:18:13Z

Just as @steppenwolfyuetong mentioned, unique is same as distinct in SQL (so we might want to call it DISTINCT as well)

DISTINCT is different from GROUP BY

sherman-the-tank · 2019-06-30T03:30:59Z

In SQL, SELECT... is the only data fetching statement, so it makes sense to make "ORDER BY", 'GROUP BY" to be part of SELECT statement

But in nGQL, we have multiple data fetching statements (GO, MATCH, FIND...), so it makes sense to make ORDER BY and GROUP BY standalone statements, so that they could be used in the statement pipeline. Such statement could be

ORDER BY <col1> <col2> ...
GROUP BY <col1> <col2> ...
DEDUP BY <col1> <col2> ... # Same as distinct

Since uniqueness is very a common feature used in the data fetching, we could add it to the data fetch statements, such as in the GO... statement. Here is the syntax

GO ... YIELDS <col1>, DISTINCT <col2>, ...

Be noted, there could be only ONE column decorated with DISTINCT in the data fetching statements. If you want to distinct on a combination of multiple columns, you need to use DEDUP BY statement in the pipeline

sherman-the-tank · 2019-06-30T03:32:24Z

Since the simple DISTINCT decoration on one column could probably cover more than 80% of the requirements, let's implement this first

CPWstatic · 2019-07-01T05:43:21Z

GO ... YIELDS <col1>, DISTINCT <col2>, ...

As far as I know from SQL, DISTINCT would only take effect when it is placed in front of all returned fields, or it will be a kind of syntax error. That means DISTINCT would work on row, not single column. If you tend to make DISTINCT a decoration of single column, you have to define what value should return.
Here I suggest the syntax should always like:

GO ... YIELDS DISTINCT <col1>, <col2>, ...

And the DEDUP, if we want make it a standalone statement, it should work like:

GO ...  | DEDUP | GO ...

dutor · 2019-07-01T06:40:59Z

Since the simple DISTINCT decoration on one column could probably cover more than 80% of the requirements, let's implement this first

Given a second thought, I somewhat agree with @CPWstatic . It's hard to define the behaviour of DISTINCT when it's applied on parts of a row. At the same time, it hardly makes sense when one wants to fetch columns, which are not going to be used by DISTINCT, which will be discarded for some of rows.

If one wants to fetch results and to distinct by parts of columns, I think he should resort to variables, like

$full_results = GO ... YIELD col1, col2, col3;   # fetch results into a variable
$deduped_by_col1 = DEDUP $full_results BY col1; # dedup by col1, with col2, col3 discarded
GO FROM $deduped_results.col1 ...; # use the deduped results
GO FROM $full_results.col3 ...; # use the undeduped results

sherman-the-tank · 2019-07-11T07:40:07Z

@CPWstatic I buy in your idea. Let's make DISTINCT apply to the entire row

GO ... YIELDS DISTINCT col1, col2, col3

Here we return all distinct combinations of [col1, col2, col3]

sherman-the-tank · 2019-07-11T07:40:55Z

The following should return syntax error

GO ... YIELDS col1, DISTINCT col2

Co-authored-by: Sophie <[email protected]> Co-authored-by: Yichen Wang <[email protected]> Co-authored-by: Sophie <[email protected]>

sherman-the-tank changed the title ~~【DISCUSS】How do we define the difference between unique and group by?~~ [DISCUSS] How do we define the difference between unique and group by? Jun 30, 2019

CPWstatic changed the title ~~[DISCUSS] How do we define the difference between unique and group by?~~ [DISCUSS] How do we define operation of unique in nGQL? Jul 1, 2019

CPWstatic self-assigned this Jul 4, 2019

CPWstatic closed this as completed Jul 11, 2019

yixinglu pushed a commit to yixinglu/nebula that referenced this issue Mar 21, 2022

Add gflag validator for client_idle_timeout_secs (vesoft-inc#548)

00fb53f

Co-authored-by: Sophie <[email protected]> Co-authored-by: Yichen Wang <[email protected]> Co-authored-by: Sophie <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DISCUSS] How do we define operation of unique in nGQL? #548

[DISCUSS] How do we define operation of unique in nGQL? #548

CPWstatic commented Jun 26, 2019 •

edited

Loading

ayyt commented Jun 26, 2019 •

edited

Loading

sherman-the-tank commented Jun 30, 2019

sherman-the-tank commented Jun 30, 2019

sherman-the-tank commented Jun 30, 2019

sherman-the-tank commented Jun 30, 2019

CPWstatic commented Jul 1, 2019

dutor commented Jul 1, 2019

sherman-the-tank commented Jul 11, 2019

sherman-the-tank commented Jul 11, 2019

[DISCUSS] How do we define operation of unique in nGQL? #548

[DISCUSS] How do we define operation of unique in nGQL? #548

Comments

CPWstatic commented Jun 26, 2019 • edited Loading

ayyt commented Jun 26, 2019 • edited Loading

sherman-the-tank commented Jun 30, 2019

sherman-the-tank commented Jun 30, 2019

sherman-the-tank commented Jun 30, 2019

sherman-the-tank commented Jun 30, 2019

CPWstatic commented Jul 1, 2019

dutor commented Jul 1, 2019

sherman-the-tank commented Jul 11, 2019

sherman-the-tank commented Jul 11, 2019

CPWstatic commented Jun 26, 2019 •

edited

Loading

ayyt commented Jun 26, 2019 •

edited

Loading