Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Create a config option to parse timestamps in special formats. #40

Open
jcbmllgn opened this issue May 4, 2023 · 1 comment

Comments

@jcbmllgn
Copy link
Contributor

jcbmllgn commented May 4, 2023

I have a timestamp coming from a source that causes BigQuery's default timestamp parsing to fail with how this package by default parses timestamps, this line of code generates this sql:

select 
    CAST('2023-04-26 14:57:00 -0700' as TIMESTAMP) as timestamp_without_space

Which fails for this reason:

image

The issue is that there's a space between the seconds and timezone offset, here's what comes from my source system2023-04-26 14:57:00 -0700, however, here's what BigQuery's default CAST(.. as TIMESTAMP) function wants 2023-04-26 14:57:00-0700.


I would like to be able to add a config option to my meltano.yml file for specifying how timestamps should be parsed, something like this:

  - name: target-bigquery
    variant: z3z1ma
    pip_url: git+https://github.com/z3z1ma/target-bigquery.git
    config:
      credentials_json: ${BIGQUERY_CREDENTIALS_JSON}
      project: some-project
      dataset: some-dataset
      timestamp_format: ''%Y-%m-%d %H:%M:%S %z'

and then the package would parse timestamps like this instead:

select
    PARSE_TIMESTAMP('%Y-%m-%d %H:%M:%S %z', some_special_timestamp_column) AS parsed_timestamp
@z3z1ma
Copy link
Owner

z3z1ma commented May 8, 2023

@jcbmllgn I think this is a neat solution with a low scope. The best practice in many taps is to format the timestamps as ISO but because of the sheer variation, it would be nice to have this capability here. I think something we should think about too is using safe_cast

https://cloud.google.com/bigquery/docs/reference/standard-sql/conversion_functions#safe_casting

Using this would let us essentially use coalesce to encapsulate fallback logic. Then we can just statically account for a few more date formats outside of which we say use a mapper / fix the dates from the tap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants