Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(bigquery): different name of column in sql code and table.columns #9112

Closed
1 task done
jitingxu1 opened this issue May 3, 2024 · 7 comments
Closed
1 task done
Labels
bigquery The BigQuery backend bug Incorrect behavior inside of ibis

Comments

@jitingxu1
Copy link
Contributor

jitingxu1 commented May 3, 2024

What happened?

Here is the code could be used to reproduce the error:

import ibis
ibis.options.interactive = True

con = ibis.bigquery.connect(project_id="xxx", dataset_id="xxx")

t = ibis.memtable({
    "x": ["a", "b"],
    "y": ["0", "1"]
})
con.create_table(
    "t", t.to_pyarrow(), overwrite=True
)

t = con.table("t")
t = t.mutate(
    x1=ibis.literal("c"),
    y=_.y.cast(dt.int32),
)

x = t.drop("y")
y = t.y.cast(dt.int64)

when I run

y.as_table().columns

I got ['Cast(y, int64)']

But the name in sql code is different in ibis.to_sql(y)- name in sql code is Cast_y_ int64

SELECT
  CAST(CAST(`t0`.`y` AS INT64) AS INT64) AS `Cast_y_ int64`
FROM `voltrondata-demo`.`nycflights13`.`t` AS `t0`

This will produce errors in ibisml, becuase ibisml use the table's column name to extract target - https://github.com/ibis-project/ibis-ml/blob/main/ibisml/core.py#L135

import ibisml as ml

step = ml.TargetEncode(["x"])
table, targets, index = ml.core.normalize_table(x, y)
print(targets)
metadata = ml.core.Metadata(targets=targets)
step.fit_table(table,metadata)
step.transform_table(x)

Errror

Invalid field name "Cast(y, int64)_f82688".

Updated the error message*******

400 POST https://bigquery.googleapis.com/upload/bigquery/v2/projects/voltrondata-demo/jobs?uploadType=multipart: Invalid field name "Cast(y, int64)_ef9a46". Fields must contain the allowed characters, and be at most 300 characters long. For allowed characters, please refer to https://cloud.google.com/bigquery/docs/schemas#column_names

Not very sure, we should solve this in ibis or ibisml.

What version of ibis are you using?

9

What backend(s) are you using, if any?

bigquery

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@jitingxu1 jitingxu1 added the bug Incorrect behavior inside of ibis label May 3, 2024
@jitingxu1 jitingxu1 changed the title bug(bigquery): bug(bigquery): space in column name issue May 3, 2024
@gforsyth gforsyth added the bigquery The BigQuery backend label May 3, 2024
@jitingxu1 jitingxu1 changed the title bug(bigquery): space in column name issue bug(bigquery): different name of column in sql code and table.columns May 4, 2024
@jitingxu1
Copy link
Contributor Author

Rewrite the problem description with an easy example

@jitingxu1
Copy link
Contributor Author

Updated the error message*******

400 POST https://bigquery.googleapis.com/upload/bigquery/v2/projects/voltrondata-demo/jobs?uploadType=multipart: Invalid field name "Cast(y, int64)_ef9a46". Fields must contain the allowed characters, and be at most 300 characters long. For allowed characters, please refer to https://cloud.google.com/bigquery/docs/schemas#column_names

seems like Cast(y, int64)_ef9a46 is not a valid name for bigquery, because have letters after number.

@gforsyth
Copy link
Member

gforsyth commented May 8, 2024

are you running against main? That looks like the bigquery issues I fixed yesterday in #9141 and #9149

@jitingxu1
Copy link
Contributor Author

oh, no. Let me pull it from main

are you running against main? That looks like the bigquery issues I fixed yesterday in #9141 and #9149

@cpcloud
Copy link
Member

cpcloud commented May 8, 2024

We probably need to see the ibis expression. Column names that look like that should effectively never make it into bigquery.

Can you give a reproducible example that only uses Ibis? We definitely need to see what Ibis APIs you're using to produce this error.

@jitingxu1
Copy link
Contributor Author

we convert an intermediate table into a memtable in bis-ml, see link

The intermediate information in the expression tree got lost, unfortunately, the new name "Cast(y, float64)" is not a valid bigquery column name.

I think it is not an ibis issue, we could fix this in ibisml.

Here is the simplified code to reproduce the error:

import ibis
ibis.options.interactive = True

con = ibis.bigquery.connect(project_id="xx", dataset_id="xx")

t = ibis.memtable({
    "x": ["a", "b"],
    "y": ["0", "1"]
})
con.create_table(
    "t", t.to_pyarrow(), overwrite=True
)

t = con.table("t")

base_table = t.drop("y")
target_table = t.y.cast(dt.float64)


y_m = ibis.memtable(target_table.as_table().to_pyarrow())
y_m = y_m.mutate(x=ibis.literal("a"))

base_table.join(y_m, "x")

And the new name "Cast(y, float64)" is not a valid name in bigquery.

ibis.to_sql(y_m)
sql
SELECT
  "t0"."Cast(y, float64)",
  'a' AS "x"
FROM "ibis_pyarrow_memtable_cimh6vdzlvhyxlddji3g6jvugq" AS "t0"

cpcloud pushed a commit that referenced this issue May 10, 2024
Removes extra whitespace from the generated field name.

xref #9112 but doesn't strictly solve the issue
@gforsyth
Copy link
Member

I think that between the whitespace fixes in #9160 and the improved to_sql for memtables in #9228 that this may not be an issue anymore. I'm going to close it out, but ping if this shows up again!

@github-project-automation github-project-automation bot moved this from backlog to done in Ibis planning and roadmap May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery The BigQuery backend bug Incorrect behavior inside of ibis
Projects
Archived in project
Development

No branches or pull requests

3 participants