Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non equi join for character columns #2308

Open
franknarf1 opened this issue Aug 17, 2017 · 6 comments
Open

non equi join for character columns #2308

franknarf1 opened this issue Aug 17, 2017 · 6 comments
Labels
enhancement idate/itime non-equi joins rolling, overlapping, non-equi joins

Comments

@franknarf1
Copy link
Contributor

franknarf1 commented Aug 17, 2017

I noticed this error message:

library(data.table)
data.table(id = LETTERS[1:3])[.(id = "B"), on=.(id > id)]
# Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
#   Only '==' operator is supported for columns of type character.

I couldn't find any open issue filed for adding this feature (since #1452 is done already). Is it not planned?

I ran into it looking at an SO question:

# goal: use bmerge for this subset condition: letter != '' & date < as.IDate('2014-10-01')
library(data.table)
dt <- data.table(letter = rep(c('',letters[1:4]), each = 4),
  date = as.IDate(c('2014-09-29','2014-09-30','2014-10-01','2014-10-02')))

# attempt
dt[.(L = '', D = as.IDate('2014-10-01')), on=.(letter > L, date < D), verbose = TRUE]
# Non-equi join operators detected ... 
#   forder took ... 0.02 secs
#   Generating non-equi group ids ... done in 0 secs
#   Recomputing forder with non-equi ids ... done in 0 secs
#   Found 4 non-equi group(s) ...
# Starting bmerge ...Error in bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch,  : 
#   Only '==' operator is supported for columns of type character.

(I figured this would work for letter != '' so long as "" is the minimum character.)

Anyway, not a big deal for me, but figured I'd ask.

@arunsrinivasan
Copy link
Member

@franknarf1 != could work for char types, but isn't implemented yet. When done, this'll be taken care of. I can't remember if there's an issue for != addition to non-equi joins yet. IF not, the title of this issue could be changed to reflect that. I don't think it makes sense for other ops on char types.

Perhaps the error message could state that != when added will be made available for char types as well.

@franknarf1
Copy link
Contributor Author

Thanks @arunsrinivasan , that makes sense.

Feel free to change the title (support bmerge for char !=?) or close as/when appropriate.

@arunsrinivasan arunsrinivasan added non-equi joins rolling, overlapping, non-equi joins enhancement labels Jan 16, 2018
@jeroenjanssens
Copy link

I would like to be able to use <= and > for char types. Would that be as easy to implement as != ?

@jangorecki
Copy link
Member

jangorecki commented Jun 30, 2019

Are you aware of portability issues related to OS encoding? I would suggest to decode char to factor and non equi join integers. You will get what you need in a portable and reliable way.

@jeroenjanssens
Copy link

Thanks @jangorecki. I'm comparing strings of the format [0-9]{4} [A-Z]{2} so the many possible combinations make it infeasible to convert them to a factor.

I can imagine strings can be quite challenging to maintain portability. Do you know how base R does this when comparing strings?

@jangorecki jangorecki changed the title [Request] non equi join for character columns non equi join for character columns Jul 2, 2019
@adamaltmejd
Copy link

Would be very interested to see "!=" join for character. Any progress on this? Or suggestion for workarounds meanwhile?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement idate/itime non-equi joins rolling, overlapping, non-equi joins
Projects
None yet
Development

No branches or pull requests

6 participants