Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python, rust): clearer message when stringcache-related errors occur #9715

Merged
merged 6 commits into from
Jul 5, 2023

Conversation

MarcoGorelli
Copy link
Collaborator

@MarcoGorelli MarcoGorelli commented Jul 4, 2023

related to #9106 (debatable whether it closes it)

Demo:

latest release

In [2]: df1 = pl.DataFrame({"a": ["1", "2"]}, schema={"a": pl.Categorical})
   ...: df2 = pl.DataFrame({"a": ["1", "2"]}, schema={"a": pl.Categorical})
   ...: df1.join(df2, on='a')
---------------------------------------------------------------------------
ComputeError: joins/or comparisons on categoricals can only happen if they were created under the same global string cache
In [3]: df1 = pl.DataFrame({"a": ["1", "2"]}, schema={"a": pl.Categorical})
   ...: df2 = pl.DataFrame({"a": ["3", "4"]}, schema={"a": pl.Categorical})
   ...: pl.concat([df1, df2])
---------------------------------------------------------------------------
ComputeError: cannot concat categoricals coming from a different source; consider setting a global StringCache

here

In [1]: df1 = pl.DataFrame({"a": ["1", "2"]}, schema={"a": pl.Categorical})
   ...: df2 = pl.DataFrame({"a": ["1", "2"]}, schema={"a": pl.Categorical})
   ...: df1.join(df2, on='a')
---------------------------------------------------------------------------
ComputeError: joins/or comparisons on categoricals can only happen
if they were created under the same global string cache.

Help: if you're using Python, this may look something like:

    with pl.StringCache():
        df1 = pl.DataFrame({'a': ['1', '2']}, schema={'a': pl.Categorical})
        df2 = pl.DataFrame({'a': ['1', '3']}, schema={'a': pl.Categorical})
        df1.join(df2, on='a')

Alternatively, if the performance cost is acceptable, you could just set:

    pl.enable_string_cache(True)

on startup.
In [2]: df1 = pl.DataFrame({"a": ["1", "2"]}, schema={"a": pl.Categorical})
   ...: df2 = pl.DataFrame({"a": ["3", "4"]}, schema={"a": pl.Categorical})
   ...: pl.concat([df1, df2])
---------------------------------------------------------------------------
ComputeError: cannot concat categoricals coming from a different source, consider setting a global StringCache.

Help: if you're using Python, this may look something like:

    with pl.StringCache():
        df1 = pl.DataFrame({'a': ['1', '2']}, schema={'a': pl.Categorical})
        df2 = pl.DataFrame({'a': ['1', '3']}, schema={'a': pl.Categorical})
        pl.concat([df1, df2])

Alternatively, if the performance cost is acceptable, you could just set:

    pl.enable_string_cache(True)

on startup.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Jul 4, 2023
@MarcoGorelli MarcoGorelli marked this pull request as ready for review July 4, 2023 20:51
Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks a lot!

One remark. I think we can store this message in polars-error under a shortcut (there are more).

@@ -95,8 +95,22 @@ pub fn _check_categorical_src(l: &DataType, r: &DataType) -> PolarsResult<()> {
if let (DataType::Categorical(Some(l)), DataType::Categorical(Some(r))) = (l, r) {
polars_ensure!(
l.same_src(r),
ComputeError: "joins/or comparisons on categoricals can only happen if they were \
created under the same global string cache"
ComputeError: r#"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we deduplicate this message? Maybe add it in polars-error?

@ritchie46 ritchie46 merged commit ee9c589 into pola-rs:main Jul 5, 2023
c-peters pushed a commit to c-peters/polars that referenced this pull request Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants