Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support BigQuery custom schema's for external data using CSV / NDJSON #3717

Conversation

ffung
Copy link
Contributor

@ffung ffung commented Jun 29, 2020

external_data_configuration.schema for CSV and NDJSON formats

Fixes hashicorp/terraform-provider-google#6693

Release Note Template for Downstream PRs (will be copied)

bigquery: added support for BigQuery custom schemas for external data using CSV / NDJSON 

external_data_configuration.schema for CSV and NDJSON formats
@ffung ffung changed the title support custom Support BigQuery custom schema's for external data using CSV / NDJSON Jun 29, 2020
@modular-magician
Copy link
Collaborator

Hello! I am a robot who works on Magic Modules PRs.

I have detected that you are a community contributor, so your PR will be assigned to someone with a commit-bit on this repo for initial review.

Thanks for your contribution! A human will be with you soon.

@danawillow, please review this PR or find an appropriate assignee.

@modular-magician
Copy link
Collaborator

Hi! I'm the modular magician. Your PR generated some diffs in downstreams - here they are.

Diff report:

Terraform GA: Diff ( 3 files changed, 132 insertions(+), 10 deletions(-))
Terraform Beta: Diff ( 3 files changed, 132 insertions(+), 10 deletions(-))

third_party/terraform/resources/resource_bigquery_table.go Outdated Show resolved Hide resolved
// So just assume the configured schema has been applied after successful
// creation, by copying the configured value back into the resource schema.
// This avoids that reading back this field will be identified as a change.
// The `ForceNew=true` on `external_data_configuration.schema` will ensure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't totally follow why not reading back the external schema requires us to do ForceNew. Couldn't we still allow updating that field even while not detecting drift on it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ForceNew is not required in terms of reading back the data, I just consider this better UX wrt the expectations of the user, e.g. if you change the schema, the most probable course I think is to recreate the table with that schema. Afaik you cannot change the schema of an existing table in place.

The external_data_configuration.schema is only used as an input parameter for creating the table, when we read back the table, this field is always empty. There is a computed schema returned on top level, which reflects the effective schema of the created table, however this value is calculated by combining the schema provided here and any other field/type mappings it can infer by autodetection and/or inferred from the source_uri_prefix. I wanted to avoid having to determine if external_data_configuration.schema is accurately reflected in the computed schema and reimplement BQ's logic in doing so, so I just assume after creation this is successfully reflected, hence I just ignore this field by making sure there are no changes with what is configured.

Perhaps there's a smarter way to do this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. If it's not clear that it can be updated and users wouldn't necessarily expect it to be, we can leave it as ForceNew. Worst thing that happens is someone files an issue to ask for update support.

@modular-magician
Copy link
Collaborator

Hi! I'm the modular magician. Your PR generated some diffs in downstreams - here they are.

Diff report:

Terraform GA: Diff ( 3 files changed, 138 insertions(+), 18 deletions(-))
Terraform Beta: Diff ( 3 files changed, 138 insertions(+), 18 deletions(-))

@ffung ffung requested a review from danawillow July 1, 2020 06:36
* `schema` - (Optional) A JSON schema for the external table. Schema is required
for CSV and JSON formats if autodetect is not on. Schema is disallowed
for Google Cloud Bigtable, Cloud Datastore backups, Avro, ORC and Parquet formats.
A JSON schema for the table. Schema is required
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like some information got repeated here.

Can you also add a note that we don't detect drift on this field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good suggestion, I expanded a bit on the description

// So just assume the configured schema has been applied after successful
// creation, by copying the configured value back into the resource schema.
// This avoids that reading back this field will be identified as a change.
// The `ForceNew=true` on `external_data_configuration.schema` will ensure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. If it's not clear that it can be updated and users wouldn't necessarily expect it to be, we can leave it as ForceNew. Worst thing that happens is someone files an issue to ask for update support.

@ffung ffung requested a review from danawillow July 2, 2020 07:23
@modular-magician
Copy link
Collaborator

Hi! I'm the modular magician. Your PR generated some diffs in downstreams - here they are.

Diff report:

Terraform GA: Diff ( 3 files changed, 136 insertions(+), 18 deletions(-))
Terraform Beta: Diff ( 3 files changed, 136 insertions(+), 18 deletions(-))

Copy link
Contributor

@danawillow danawillow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the contribution @ffung!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BigQuery Table Hive Partitioning not Working with Explicit Schema (autodetect=false)
4 participants