Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSS] How do we define operation of unique in nGQL? #548

Closed
CPWstatic opened this issue Jun 26, 2019 · 9 comments
Closed

[DISCUSS] How do we define operation of unique in nGQL? #548

CPWstatic opened this issue Jun 26, 2019 · 9 comments
Assignees

Comments

@CPWstatic
Copy link
Contributor

CPWstatic commented Jun 26, 2019

If we seach the graph by using go sentence, we might get the same record in result. Sometimes we might need make the result uniqueness.
How do we define the difference between unique and group by? Can we just treat them as the same?

Subtask of #492

@ayyt
Copy link
Contributor

ayyt commented Jun 26, 2019

Regarding the understanding of sql, there are the following points:

  1. The unique here is distinct in sql
  2. in sql oder by, group by, distinct are operators
  3. oder by is sort, group by is grouping, distinct is to remove duplicate values

@sherman-the-tank
Copy link
Member

Please DO NOT use Chinese punctuation

@sherman-the-tank sherman-the-tank changed the title 【DISCUSS】How do we define the difference between unique and group by? [DISCUSS] How do we define the difference between unique and group by? Jun 30, 2019
@sherman-the-tank
Copy link
Member

Just as @steppenwolfyuetong mentioned, unique is same as distinct in SQL (so we might want to call it DISTINCT as well)

DISTINCT is different from GROUP BY

@sherman-the-tank
Copy link
Member

In SQL, SELECT... is the only data fetching statement, so it makes sense to make "ORDER BY", 'GROUP BY" to be part of SELECT statement

But in nGQL, we have multiple data fetching statements (GO, MATCH, FIND...), so it makes sense to make ORDER BY and GROUP BY standalone statements, so that they could be used in the statement pipeline. Such statement could be

ORDER BY <col1> <col2> ...
GROUP BY <col1> <col2> ...
DEDUP BY <col1> <col2> ... # Same as distinct

Since uniqueness is very a common feature used in the data fetching, we could add it to the data fetch statements, such as in the GO... statement. Here is the syntax

GO ... YIELDS <col1>, DISTINCT <col2>, ...

Be noted, there could be only ONE column decorated with DISTINCT in the data fetching statements. If you want to distinct on a combination of multiple columns, you need to use DEDUP BY statement in the pipeline

@sherman-the-tank
Copy link
Member

Since the simple DISTINCT decoration on one column could probably cover more than 80% of the requirements, let's implement this first

@CPWstatic
Copy link
Contributor Author

GO ... YIELDS <col1>, DISTINCT <col2>, ...

As far as I know from SQL, DISTINCT would only take effect when it is placed in front of all returned fields, or it will be a kind of syntax error. That means DISTINCT would work on row, not single column. If you tend to make DISTINCT a decoration of single column, you have to define what value should return.
Here I suggest the syntax should always like:

GO ... YIELDS DISTINCT <col1>, <col2>, ...

And the DEDUP, if we want make it a standalone statement, it should work like:

GO ...  | DEDUP | GO ...

@dutor
Copy link
Contributor

dutor commented Jul 1, 2019

Since the simple DISTINCT decoration on one column could probably cover more than 80% of the requirements, let's implement this first

Given a second thought, I somewhat agree with @CPWstatic . It's hard to define the behaviour of DISTINCT when it's applied on parts of a row. At the same time, it hardly makes sense when one wants to fetch columns, which are not going to be used by DISTINCT, which will be discarded for some of rows.

If one wants to fetch results and to distinct by parts of columns, I think he should resort to variables, like

$full_results = GO ... YIELD col1, col2, col3;   # fetch results into a variable
$deduped_by_col1 = DEDUP $full_results BY col1; # dedup by col1, with col2, col3 discarded
GO FROM $deduped_results.col1 ...; # use the deduped results
GO FROM $full_results.col3 ...; # use the undeduped results

@CPWstatic CPWstatic changed the title [DISCUSS] How do we define the difference between unique and group by? [DISCUSS] How do we define operation of unique in nGQL? Jul 1, 2019
@CPWstatic CPWstatic self-assigned this Jul 4, 2019
@sherman-the-tank
Copy link
Member

@CPWstatic I buy in your idea. Let's make DISTINCT apply to the entire row

GO ... YIELDS DISTINCT col1, col2, col3

Here we return all distinct combinations of [col1, col2, col3]

@sherman-the-tank
Copy link
Member

The following should return syntax error

GO ... YIELDS col1, DISTINCT col2

yixinglu pushed a commit to yixinglu/nebula that referenced this issue Mar 21, 2022
Co-authored-by: Sophie <[email protected]>

Co-authored-by: Yichen Wang <[email protected]>
Co-authored-by: Sophie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants