-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add delete syntax #13674
Add delete syntax #13674
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
||
// Taken from syntax of SqlInsert statement from calcite parser, edited for replace syntax | ||
SqlNode DruidSqlDeleteEof() : | ||
{ | ||
SqlNode table; | ||
SqlNode source = null; | ||
final Span s; | ||
// Using fully qualified name for Pair class, since Calcite also has a same class name being used in the Parser.jj | ||
org.apache.druid.java.util.common.Pair<Granularity, String> partitionedBy = new org.apache.druid.java.util.common.Pair(null, null); | ||
SqlNodeList clusteredBy = null; | ||
} | ||
{ | ||
<DELETE> { s = span(); } | ||
<FROM> | ||
table = CompoundIdentifier() | ||
[ | ||
<WHERE> source = Expression(ExprContext.ACCEPT_SUB_QUERY) | ||
] | ||
[ | ||
<PARTITIONED> <BY> | ||
partitionedBy = PartitionGranularity() | ||
] | ||
[ | ||
<CLUSTERED> <BY> | ||
clusteredBy = ClusterItems() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need clustering for delete? What does it mean to cluster deletions? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since in the backend, Delete triggers a reingest of the datasource, the above is the clustering for that. I think this would be improved with the catalog. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If our code that does delete needs partitioning, then that code is, I would assert, broken and needs to be fixed. Let's not add SQL syntax exposes partitioning that isn't actually needed. Let's imagine some cases. I create segments with 1-hour partitioning. Then, I create more segments, in the same time period, with 1 day partitioning. Now, I realize I messed up and want to delete this entire group of segments. Quick: which partitioning do I use? Hour? Or Day? Or, do I need different delete tasks, one for the hour segments, one for Day? Is this the user experience we want? Often, just writing down the proposed syntax, and thinking through the user experience, can save us lots of back-and-forth at the level of individual lines of code. So, I propose we either fix the need for partitioning in the reingest, or we give it a fake partitioning. Remember: there is no new data to partition. |
||
] | ||
{ | ||
if (clusteredBy != null && partitionedBy.lhs == null) { | ||
throw new ParseException("CLUSTERED BY found before PARTITIONED BY. In Druid, the CLUSTERED BY clause must follow the PARTITIONED BY clause"); | ||
} | ||
} | ||
// EOF is also present in SqlStmtEof but EOF is a special case and a single EOF can be consumed multiple times. | ||
// The reason for adding EOF here is to ensure that we create a DruidSqlReplace node after the syntax has been | ||
// validated and throw SQL syntax errors before performing validations in the DruidSqlReplace which can overshadow the | ||
// actual error message. | ||
<EOF> | ||
{ | ||
if (source == null) { | ||
throw new ParseException("WHERE clause must be present in DELETE statements. (To delete all rows, WHERE TRUE must be explicitly specified.)"); | ||
} | ||
return DruidSqlDelete.create( | ||
s.end(source), | ||
table, | ||
source, | ||
partitionedBy.lhs, | ||
partitionedBy.rhs, | ||
clusteredBy | ||
); | ||
} | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
||
package org.apache.druid.sql.calcite.parser; | ||
|
||
import org.apache.calcite.sql.SqlBasicCall; | ||
import org.apache.calcite.sql.SqlIdentifier; | ||
import org.apache.calcite.sql.SqlKind; | ||
import org.apache.calcite.sql.SqlNode; | ||
import org.apache.calcite.sql.SqlNodeList; | ||
import org.apache.calcite.sql.SqlOperator; | ||
import org.apache.calcite.sql.SqlSelect; | ||
import org.apache.calcite.sql.SqlSpecialOperator; | ||
import org.apache.calcite.sql.SqlWriter; | ||
import org.apache.calcite.sql.fun.SqlStdOperatorTable; | ||
import org.apache.calcite.sql.parser.SqlParserPos; | ||
import org.apache.druid.java.util.common.granularity.Granularity; | ||
|
||
import javax.annotation.Nonnull; | ||
import javax.annotation.Nullable; | ||
import java.util.Collections; | ||
|
||
public class DruidSqlDelete extends DruidSqlIngest | ||
{ | ||
public static final SqlOperator OPERATOR = new SqlSpecialOperator("DELETE", SqlKind.OTHER); | ||
|
||
public static DruidSqlDelete create( | ||
SqlParserPos pos, | ||
SqlNode targetTable, | ||
SqlNode source, | ||
@Nullable Granularity partitionedBy, | ||
@Nullable String partitionedByStringForUnparse, | ||
@Nullable SqlNodeList clusteredBy | ||
) | ||
{ | ||
SqlBasicCall invertedSource = new SqlBasicCall(SqlStdOperatorTable.NOT, new SqlNode[] {source}, pos); | ||
SqlSelect sqlSelect = new SqlSelect( | ||
pos, | ||
SqlNodeList.EMPTY, | ||
new SqlNodeList(Collections.singleton(SqlIdentifier.star(pos)), pos), | ||
targetTable, | ||
invertedSource, | ||
null, | ||
null, | ||
null, | ||
null, | ||
null, | ||
null | ||
); | ||
return new DruidSqlDelete(pos, targetTable, sqlSelect, partitionedBy, partitionedByStringForUnparse, clusteredBy); | ||
} | ||
|
||
public DruidSqlDelete(SqlParserPos pos, | ||
SqlNode targetTable, | ||
SqlNode source, | ||
@Nullable Granularity partitionedBy, | ||
@Nullable String partitionedByStringForUnparse, | ||
@Nullable SqlNodeList clusteredBy | ||
) | ||
{ | ||
super( | ||
pos, | ||
SqlNodeList.EMPTY, | ||
targetTable, | ||
source, | ||
null, | ||
partitionedBy, | ||
partitionedByStringForUnparse, | ||
clusteredBy | ||
); | ||
} | ||
|
||
@Nonnull | ||
@Override | ||
public SqlOperator getOperator() | ||
{ | ||
return OPERATOR; | ||
} | ||
|
||
@Override | ||
public void unparse(SqlWriter writer, int leftPrec, int rightPrec) | ||
{ | ||
super.unparse(writer, leftPrec, rightPrec); | ||
writer.keyword("PARTITIONED BY"); | ||
writer.keyword(partitionedByStringForUnparse); | ||
if (getClusteredBy() != null) { | ||
writer.keyword("CLUSTERED BY"); | ||
SqlWriter.Frame frame = writer.startList("", ""); | ||
for (SqlNode clusterByOpts : getClusteredBy().getList()) { | ||
clusterByOpts.unparse(writer, leftPrec, rightPrec); | ||
} | ||
writer.endList(frame); | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
||
package org.apache.druid.sql.calcite; | ||
|
||
import org.apache.druid.query.InlineDataSource; | ||
import org.apache.druid.query.filter.NotDimFilter; | ||
import org.apache.druid.query.filter.SelectorDimFilter; | ||
import org.apache.druid.sql.SqlPlanningException; | ||
import org.apache.druid.sql.calcite.filtration.Filtration; | ||
import org.junit.Test; | ||
|
||
import java.util.Collections; | ||
|
||
public class CalciteDeleteDmlTest extends CalciteIngestionDmlTest | ||
{ | ||
@Test | ||
public void testDeleteFromTableWithCondition() | ||
{ | ||
testIngestionQuery() | ||
.sql("DELETE FROM foo WHERE dim1 = 'New' PARTITIONED BY ALL TIME") | ||
.expectTarget("foo", FOO_TABLE_SIGNATURE) | ||
.expectResources(dataSourceRead("foo"), dataSourceWrite("foo")) | ||
.expectQuery( | ||
newScanQueryBuilder() | ||
.dataSource("foo") | ||
.intervals(querySegmentSpec(Filtration.eternity())) | ||
.filters(new NotDimFilter(new SelectorDimFilter("dim1", "New", null))) | ||
.columns("__time", "cnt", "dim1", "dim2", "dim3", "m1", "m2", "unique_dim1") | ||
.context(REPLACE_ALL_TIME_CHUNKS) | ||
.build() | ||
) | ||
.verify(); | ||
} | ||
|
||
@Test | ||
public void testDeleteFromTableWithoutWhereClause() | ||
{ | ||
testIngestionQuery() | ||
.sql("DELETE FROM foo PARTITIONED BY ALL TIME") | ||
.expectValidationError( | ||
SqlPlanningException.class, | ||
"WHERE clause must be present in DELETE statements. (To delete all rows, WHERE TRUE must be explicitly specified.)" | ||
) | ||
.verify(); | ||
} | ||
|
||
@Test | ||
public void testDeleteAllFromTable() | ||
{ | ||
testIngestionQuery() | ||
.sql("DELETE FROM foo WHERE TRUE PARTITIONED BY ALL TIME") | ||
.expectTarget("foo", FOO_TABLE_SIGNATURE) | ||
.expectResources(dataSourceRead("foo"), dataSourceWrite("foo")) | ||
.expectQuery( | ||
newScanQueryBuilder() | ||
.dataSource(InlineDataSource.fromIterable(Collections.emptyList(), FOO_TABLE_SIGNATURE)) | ||
.intervals(querySegmentSpec(Filtration.eternity())) | ||
.filters(null) | ||
.columns("__time", "cnt", "dim1", "dim2", "dim3", "m1", "m2", "unique_dim1") | ||
.context(REPLACE_ALL_TIME_CHUNKS) | ||
.build() | ||
) | ||
.verify(); | ||
|
||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this syntax is changing a bit in this PR: #13686
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the heads up. I'm looking forward to the catalog changes, I think this would make the delete syntax (and all the others actually) a lot more "SQL-like" since we can drop the partitioned by as a necessary part of the query.