Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Columnnotfounderror after join #9778

Closed
2 tasks done
arnabanimesh opened this issue Jul 8, 2023 · 3 comments · Fixed by #9797
Closed
2 tasks done

Columnnotfounderror after join #9778

arnabanimesh opened this issue Jul 8, 2023 · 3 comments · Fixed by #9797
Labels
bug Something isn't working python Related to Python Polars

Comments

@arnabanimesh
Copy link
Contributor

Polars version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Issue description

Error when running the sample code: exceptions.ColumnNotFoundError: Idx

The issue doesn't occur when specifying lazyframe using dictionary. It occurs when reading csv file using scan_csv.

Sample datasets attached:
a.csv
b.csv

Reproducible example

import polars as pl

df = pl.scan_csv("a.csv").with_row_count("Idx")
sec_df = pl.scan_csv("b.csv").with_row_count("B Idx")
df = df.join(df,on="B")
print(df.collect())
grouped_df = df.groupby("A").all()
print(grouped_df.collect())

Expected behavior

The code should run

Installed versions

--------Version info---------
Polars:              0.18.6
Index type:          UInt32
Platform:            Windows-10-10.0.22621-SP0
Python:              3.11.4 (tags/v3.11.4:d2340ef, Jun  7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_sqlite:  <not installed>
connectorx:          <not installed>
deltalake:           <not installed>
fsspec:              <not installed>
matplotlib:          <not installed>
numpy:               <not installed>
pandas:              <not installed>
pyarrow:             <not installed>
pydantic:            <not installed>
sqlalchemy:          <not installed>
xlsx2csv:            <not installed>
xlsxwriter:          <not installed>
@arnabanimesh arnabanimesh added bug Something isn't working python Related to Python Polars labels Jul 8, 2023
@cmdlineluser
Copy link
Contributor

cmdlineluser commented Jul 8, 2023

tempfile.NamedTemporaryFile can be used to inline your example:

import polars as pl
from tempfile import NamedTemporaryFile

csv_a = NamedTemporaryFile()
csv_a.write(b"""
A,B
Gr1,A
Gr1,B
""".strip())

csv_a.seek(0)

df_a = pl.scan_csv(csv_a.name).with_row_count("Idx")

df_a.join(df_a, on="B").collect()
# shape: (2, 5)
# ┌─────┬─────┬─────┬───────────┬─────────┐
# │ Idx ┆ A   ┆ B   ┆ Idx_right ┆ A_right │
# │ --- ┆ --- ┆ --- ┆ ---       ┆ ---     │
# │ u32 ┆ str ┆ str ┆ u32       ┆ str     │
# ╞═════╪═════╪═════╪═══════════╪═════════╡
# │ 0   ┆ Gr1 ┆ A   ┆ 0         ┆ Gr1     │
# │ 1   ┆ Gr1 ┆ B   ┆ 1         ┆ Gr1     │
# └─────┴─────┴─────┴───────────┴─────────┘

df_a.join(df_a, on="B").groupby("A").all().collect()
# ColumnNotFoundError: Idx

Oddly enough if you select just the Idx column on its own, it's there

df_a.join(df_a, on="B").select("Idx").collect()
# shape: (2, 1)
# ┌─────┐
# │ Idx │
# │ --- │
# │ u32 │
# ╞═════╡
# │ 0   │
# │ 1   │
# └─────┘

df_a.join(df_a, on="B").select("Idx", "A").collect()
# ColumnNotFoundError: Idx

@avimallu
Copy link
Contributor

avimallu commented Jul 8, 2023

The example that @cmdlineluser provided works just fine in Polars 0.18.4 and stopped working from 0.18.5:

>>> import polars as pl
>>> from tempfile import NamedTemporaryFile
>>> 
>>> csv_a = NamedTemporaryFile()
>>> csv_a.write(b"""
... A,B
... Gr1,A
... Gr1,B
... """.strip())
15
>>> 
>>> csv_a.seek(0)
0
>>> 
>>> df_a = pl.scan_csv(csv_a.name).with_row_count("Idx")
>>> df_a.join(df_a, on="B").groupby("A").all().collect()
shape: (1, 5)
┌─────┬───────────┬────────────┬───────────┬────────────────┐
│ AIdxBIdx_rightA_right        │
│ ---------------            │
│ strlist[u32] ┆ list[str]  ┆ list[u32] ┆ list[str]      │
╞═════╪═══════════╪════════════╪═══════════╪════════════════╡
│ Gr1 ┆ [0, 1]    ┆ ["A", "B"] ┆ [0, 1]    ┆ ["Gr1", "Gr1"] │
└─────┴───────────┴────────────┴───────────┴────────────────┘
>>> pl.show_versions()
--------Version info---------
Polars:      0.18.4
Index type:  UInt32
Platform:    macOS-13.4.1-arm64-arm-64bit
Python:      3.10.9 (main, Jan 11 2023, 09:18:18) [Clang 14.0.6 ]

----Optional dependencies----
numpy:       1.24.3
pandas:      1.5.3
pyarrow:     11.0.0
connectorx:  0.3.1
deltalake:   0.10.0
fsspec:      2023.4.0
matplotlib:  3.7.1
xlsx2csv:    0.8.1
xlsxwriter:  3.0.9

@avimallu
Copy link
Contributor

avimallu commented Jul 8, 2023

works just fine in Polars 0.18.4 and stopped working from 0.18.5:

Some git bisect sleuthing says that #9700 is the cause of this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants