-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return booleans from expression comparisons, allow for vectors to be defined in expressions #1548
Conversation
|
||
// Custom functions from computed_functions.cpp | ||
REGISTER_COMPUTE_FUNCTIONS(vocab) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these need to be macros? They impair readability here.
At minimum, they should take sym_table
as argument and not name-capture as they do now.
ae1b113
to
2af96ad
Compare
…alar comparator, fix vectors for exprtk
cea8799
to
e9620fb
Compare
e9620fb
to
6386af3
Compare
Looks good! Thanks for the PR! I've added a step to the pre-processor to convert |
This PR rewrites a good amount of our ExprTk integration so that comparisons such as
==
,<
,and
,or
etc. return boolean columns instead of floats. Originally, all comparisons returned floats because ExprTk treated booleans as the number 0 or 1, and passed them into thet_tscalar
constructor as ints and not booleans. Because scalar comparison between different types is not possible, functions had to return float values in order for conditionals to work.In this branch, I've added explicit specializations for more of ExprTk's processing code so that operators and conditional evaluators always return boolean scalars. In combination with the UI tweaks in #1547, expressions now can be easily used as filters on the dataset:
This also works well for defining ranges using
inrange
, such as a date range:Finally, users can define vectors inside expressions and use them/return scalars from the vector at will:
Vectors specifically enable a massive amount of features, including functions such as
find
andsplit
which need to return more than one value. Afind(string, regex, output_vector)
function, for example, will store its output ofstart_idx, end_idx
inoutput_vector
, and the user can then create a substring from those indices usingsubstr(output[0], output[1])
.The values
True
andFalse
have been added, replacing the valuestrue
andfalse
(without capital letters), which resolved to the numbers 1 and 0.True
andFalse
, meanwhile, resolve totrue
andfalse
boolean scalars, which means they can be used in comparisons against other booleans, whereas the old values will now result in a syntax error.Finally, a
boolean
function has been added to cast a scalar or column of any type into a boolean column, returning True if a value is set (including "falsy" values such as 0 and ""), and False for nulls.Numerous tests have been added, as always.