-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Columnnotfounderror after join #9778
Comments
tempfile.NamedTemporaryFile can be used to inline your example: import polars as pl
from tempfile import NamedTemporaryFile
csv_a = NamedTemporaryFile()
csv_a.write(b"""
A,B
Gr1,A
Gr1,B
""".strip())
csv_a.seek(0)
df_a = pl.scan_csv(csv_a.name).with_row_count("Idx")
df_a.join(df_a, on="B").collect()
# shape: (2, 5)
# ┌─────┬─────┬─────┬───────────┬─────────┐
# │ Idx ┆ A ┆ B ┆ Idx_right ┆ A_right │
# │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
# │ u32 ┆ str ┆ str ┆ u32 ┆ str │
# ╞═════╪═════╪═════╪═══════════╪═════════╡
# │ 0 ┆ Gr1 ┆ A ┆ 0 ┆ Gr1 │
# │ 1 ┆ Gr1 ┆ B ┆ 1 ┆ Gr1 │
# └─────┴─────┴─────┴───────────┴─────────┘
df_a.join(df_a, on="B").groupby("A").all().collect()
# ColumnNotFoundError: Idx Oddly enough if you select just the df_a.join(df_a, on="B").select("Idx").collect()
# shape: (2, 1)
# ┌─────┐
# │ Idx │
# │ --- │
# │ u32 │
# ╞═════╡
# │ 0 │
# │ 1 │
# └─────┘
df_a.join(df_a, on="B").select("Idx", "A").collect()
# ColumnNotFoundError: Idx |
The example that @cmdlineluser provided works just fine in Polars 0.18.4 and stopped working from 0.18.5: >>> import polars as pl
>>> from tempfile import NamedTemporaryFile
>>>
>>> csv_a = NamedTemporaryFile()
>>> csv_a.write(b"""
... A,B
... Gr1,A
... Gr1,B
... """.strip())
15
>>>
>>> csv_a.seek(0)
0
>>>
>>> df_a = pl.scan_csv(csv_a.name).with_row_count("Idx")
>>> df_a.join(df_a, on="B").groupby("A").all().collect()
shape: (1, 5)
┌─────┬───────────┬────────────┬───────────┬────────────────┐
│ A ┆ Idx ┆ B ┆ Idx_right ┆ A_right │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ list[u32] ┆ list[str] ┆ list[u32] ┆ list[str] │
╞═════╪═══════════╪════════════╪═══════════╪════════════════╡
│ Gr1 ┆ [0, 1] ┆ ["A", "B"] ┆ [0, 1] ┆ ["Gr1", "Gr1"] │
└─────┴───────────┴────────────┴───────────┴────────────────┘
>>> pl.show_versions()
--------Version info---------
Polars: 0.18.4
Index type: UInt32
Platform: macOS-13.4.1-arm64-arm-64bit
Python: 3.10.9 (main, Jan 11 2023, 09:18:18) [Clang 14.0.6 ]
----Optional dependencies----
numpy: 1.24.3
pandas: 1.5.3
pyarrow: 11.0.0
connectorx: 0.3.1
deltalake: 0.10.0
fsspec: 2023.4.0
matplotlib: 3.7.1
xlsx2csv: 0.8.1
xlsxwriter: 3.0.9 |
Some |
2 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Polars version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Issue description
Error when running the sample code:
exceptions.ColumnNotFoundError: Idx
The issue doesn't occur when specifying lazyframe using dictionary. It occurs when reading csv file using
scan_csv
.Sample datasets attached:
a.csv
b.csv
Reproducible example
Expected behavior
The code should run
Installed versions
The text was updated successfully, but these errors were encountered: