Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in @filter() with multiple conditions #95

Closed
christophscheuch opened this issue Apr 19, 2023 · 6 comments
Closed

Errors in @filter() with multiple conditions #95

christophscheuch opened this issue Apr 19, 2023 · 6 comments

Comments

@christophscheuch
Copy link

As I'm compiling a basic set of introductory applications, I uncovered an issue with @filter(): I cannot use logical and / or in filtering the data:

using Tidier
df = DataFrame(a = repeat(1:10), b = repeat('a':'e', inner = 2))

@chain df begin
    @filter(a == 1 & b == "a")
end
# ERROR: MethodError: no method matching &(::Int64, ::Char)

@chain df begin
    @filter(a == 1 && b == "a")
end
# ERROR: TypeError: non-boolean (BitVector) used in boolean context

@chain df begin
    @filter(a == 1 | b == "a")
end
# ERROR: MethodError: no method matching &(::Int64, ::Char)

@chain df begin
    @filter(a == 1 || b == "a")
end
# ERROR: TypeError: non-boolean (BitVector) used in boolean context

Interestingly, the following chunk works, but produces non-meaningful results since it returns an empty data frame (while the result should be the first row).

@chain df begin
    @filter(a == 1, b == "a")
end

I did not find any hints in the documentation. Am I missing something?

@kdpsingh
Copy link
Member

This may be a datatype issue. Will check when I get some time. Single quotes denote characters and double quotes denote strings. Would try b == 'a' instead of b == "a".

@christophscheuch
Copy link
Author

Ah, I did not realize the difference - thanks! However, the errors are the same, regardless whether I use single or double quotes as you suggested.

To avoid any data type issues, I created another data frame with only integers and get other errors:

df2 = DataFrame(a = repeat(1:10), b = repeat(1:5, inner = 2))

@chain df2 begin
    @filter(a == 1 & b == 1)
end
# ERROR: UndefVarError: a not defined

@chain df2 begin
    @filter(a == 1 && b == 1)
end
# ERROR: TypeError: non-boolean (BitVector) used in boolean context

@kdpsingh
Copy link
Member

Thank you for trying it. I will take a look!

@kdpsingh
Copy link
Member

Ok, I figured out the issue, which has to do with a bug in Tidier.jl as well as an operator precedence issue in Julia.

The bug: && should be auto-vectorized by Tidier, but it wasn't because its abstract syntax tree looks different from & -- unlike &, && doesn't behave like a function.

The bug is now fixed in a PR. Once it passes tests, will merge.

The operator precedence issue: While == takes on higher precedence than &&, it actually has a lower precedence than &. This means that if using &, you have to write @filter((a == 1) & (b == 1)), which works correctly.

I've added a doctest and have updated the documentation with 3 different ways to handle this situation.

@kdpsingh
Copy link
Member

@kdpsingh
Copy link
Member

(the documentation is still re-building so may need to check again in a few mins in case looking now)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants