Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return booleans from expression comparisons, allow for vectors to be defined in expressions #1548

Merged
merged 7 commits into from
Sep 26, 2021

Conversation

sc1f
Copy link
Contributor

@sc1f sc1f commented Sep 23, 2021

This PR rewrites a good amount of our ExprTk integration so that comparisons such as ==, <, and, or etc. return boolean columns instead of floats. Originally, all comparisons returned floats because ExprTk treated booleans as the number 0 or 1, and passed them into the t_tscalar constructor as ints and not booleans. Because scalar comparison between different types is not possible, functions had to return float values in order for conditionals to work.

In this branch, I've added explicit specializations for more of ExprTk's processing code so that operators and conditional evaluators always return boolean scalars. In combination with the UI tweaks in #1547, expressions now can be easily used as filters on the dataset:

booleans

This also works well for defining ranges using inrange, such as a date range:

inrange

Finally, users can define vectors inside expressions and use them/return scalars from the vector at will:

vectors

Vectors specifically enable a massive amount of features, including functions such as find and split which need to return more than one value. A find(string, regex, output_vector) function, for example, will store its output of start_idx, end_idx in output_vector, and the user can then create a substring from those indices using substr(output[0], output[1]).

The values True and False have been added, replacing the values true and false (without capital letters), which resolved to the numbers 1 and 0. True and False, meanwhile, resolve to true and false boolean scalars, which means they can be used in comparisons against other booleans, whereas the old values will now result in a syntax error.

Finally, a boolean function has been added to cast a scalar or column of any type into a boolean column, returning True if a value is set (including "falsy" values such as 0 and ""), and False for nulls.

Numerous tests have been added, as always.

@sc1f sc1f requested a review from texodus September 23, 2021 22:52
@sc1f sc1f added enhancement Feature requests or improvements C++ labels Sep 23, 2021
@sc1f sc1f added this to the 1.0.0 milestone Sep 23, 2021

// Custom functions from computed_functions.cpp
REGISTER_COMPUTE_FUNCTIONS(vocab)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to be macros? They impair readability here.

At minimum, they should take sym_table as argument and not name-capture as they do now.

@texodus texodus force-pushed the scalar-comparisons branch 2 times, most recently from cea8799 to e9620fb Compare September 26, 2021 19:22
@texodus
Copy link
Member

texodus commented Sep 26, 2021

Looks good! Thanks for the PR!

I've added a step to the pre-processor to convert true and false to True and False symbols, respectively, as I felt the error message especially when using these was unclear ("Syntax Error"). However, like #1550, the regex for these is not correct - it does not escape quotations, e.g.. This is far from the only incorrect behavior in the pre-processor however, so I've elected to just allow this rare corner case in the interest of fixing the larger issue (booleans). We really do need to fix the pre-processor in a future PR however, starting by re-factoring it to wasm so it need not be re-implemented for multiple languages.

@texodus texodus merged commit 5fbe447 into master Sep 26, 2021
@texodus texodus deleted the scalar-comparisons branch September 26, 2021 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ enhancement Feature requests or improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants