-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support BigQuery custom schema's for external data using CSV / NDJSON #3717
Support BigQuery custom schema's for external data using CSV / NDJSON #3717
Conversation
external_data_configuration.schema for CSV and NDJSON formats
Hello! I am a robot who works on Magic Modules PRs. I have detected that you are a community contributor, so your PR will be assigned to someone with a commit-bit on this repo for initial review. Thanks for your contribution! A human will be with you soon. @danawillow, please review this PR or find an appropriate assignee. |
// So just assume the configured schema has been applied after successful | ||
// creation, by copying the configured value back into the resource schema. | ||
// This avoids that reading back this field will be identified as a change. | ||
// The `ForceNew=true` on `external_data_configuration.schema` will ensure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't totally follow why not reading back the external schema requires us to do ForceNew. Couldn't we still allow updating that field even while not detecting drift on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ForceNew is not required in terms of reading back the data, I just consider this better UX wrt the expectations of the user, e.g. if you change the schema, the most probable course I think is to recreate the table with that schema. Afaik you cannot change the schema of an existing table in place.
The external_data_configuration.schema
is only used as an input parameter for creating the table, when we read back the table, this field is always empty. There is a computed schema
returned on top level, which reflects the effective schema of the created table, however this value is calculated by combining the schema provided here and any other field/type mappings it can infer by autodetection and/or inferred from the source_uri_prefix
. I wanted to avoid having to determine if external_data_configuration.schema
is accurately reflected in the computed schema and reimplement BQ's logic in doing so, so I just assume after creation this is successfully reflected, hence I just ignore this field by making sure there are no changes with what is configured.
Perhaps there's a smarter way to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. If it's not clear that it can be updated and users wouldn't necessarily expect it to be, we can leave it as ForceNew
. Worst thing that happens is someone files an issue to ask for update support.
* `schema` - (Optional) A JSON schema for the external table. Schema is required | ||
for CSV and JSON formats if autodetect is not on. Schema is disallowed | ||
for Google Cloud Bigtable, Cloud Datastore backups, Avro, ORC and Parquet formats. | ||
A JSON schema for the table. Schema is required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like some information got repeated here.
Can you also add a note that we don't detect drift on this field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good suggestion, I expanded a bit on the description
// So just assume the configured schema has been applied after successful | ||
// creation, by copying the configured value back into the resource schema. | ||
// This avoids that reading back this field will be identified as a change. | ||
// The `ForceNew=true` on `external_data_configuration.schema` will ensure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. If it's not clear that it can be updated and users wouldn't necessarily expect it to be, we can leave it as ForceNew
. Worst thing that happens is someone files an issue to ask for update support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for the contribution @ffung!
external_data_configuration.schema for CSV and NDJSON formats
Fixes hashicorp/terraform-provider-google#6693
Release Note Template for Downstream PRs (will be copied)