-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INSERT INTO does not allow to insert only specified columns #8091
Comments
I plan to support this. I think we can begin by supporting NULL as the default value, and later on, allow specifying default values for columns. |
Now I consider several workarounds:
|
One simple approach would be to update the check here: https://github.com/apache/arrow-datafusion/blob/4512805c2087d1a5538afdaba9d2e2ca5347c90c/datafusion/core/src/datasource/listing/table.rs#L827-L835 to be more sophisticated. You can see it fails immediately if the schema's have different lengths: https://github.com/apache/arrow-datafusion/blob/4512805c2087d1a5538afdaba9d2e2ca5347c90c/datafusion/common/src/dfschema.rs#L398-L403 That check could be updated to only fail if the input schema is missing a field from the table which is not nullable. That way, the insert plan will happily continue, writing out files which are missing those nullable columns. A longer term solution would have a more explicit way to fill in missing nullable columns or columns with a default values. This could be a dedicated execution plan that comes right before an insertion (fill nulls and defaults), or this behavior could be added directly to the existing FileSinkExec plan. |
One more check performs in FileSinkExec. So fixing the check in TableProvider is not enough. |
I think the implementation of the first version could adopt a similar approach to the first one. We can fill missing values with NULLs inside For scenarios with a large number of columns, maybe we can do some profiling later. |
Describe the bug
Most SQL databases (PG, MSSQL, ClickHouse, ...) allow to do INSERTS without resort to providing all the columns.
This is allowed by
insert into TABLE(columns) VALUES (values)
syntax when nullable columns could be omited.It's important for upgrading schema of tables systems with zero downtime: on the first step we add new column and on the second we deploy new app code that deals with new schema.
Now it's impossible add column to schema without downtime due to logical planner error:
Error during planning: Inserting query must have the same schema with the table.
To Reproduce
Expected:
(10, NULL) is inserted
Actual:
Expected behavior
row (10, NULL) is inserted
Additional context
Minimal repro without CLI:
The text was updated successfully, but these errors were encountered: