Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tibble name_repair and .. in variable names #3321

Open
philippechataignon opened this issue Jan 27, 2019 · 3 comments
Open

tibble name_repair and .. in variable names #3321

philippechataignon opened this issue Jan 27, 2019 · 3 comments

Comments

@philippechataignon
Copy link
Contributor

philippechataignon commented Jan 27, 2019

I sometimes use readxl package to read Excel file. Since last version (1.2.0) I noticed a problem related to the version 2.0 of the package tibble. Now, Excel columns with no header name are renamed ..1, ..2 because tibble changes his algorithm for repairing variable names. See this article : .name_repair.

I always use data.table for my data.frames and my program broke after readxl upgrade with a data.table error. It seems there is a confusion with variables names ..1, ..2 and the .. data.table syntax used as with=F replacement.

Perhaps data.table can interpret ..1 as a variable name because it can't be a indirection. I never realize before but ..var is a valid variable name in R.

library(data.table)
library(readxl)
tmp <- tempfile()
download.file(url = "https://chataignon.com/files/read_xl.xlsx", destfile = tmp)
# essai de l'URL 'https://chataignon.com/files/read_xl.xlsx'
# Content type 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' length 5316 bytes
# ==================================================
# downloaded 5316 bytes

dt <- setDT(read_xlsx(tmp))
# New names:                                                                                                                                                                                               
# * `` -> `..2`
# * `` -> `..3`
unlink(tmp)
print(dt)
#      id     ..2        ..3      text Nom très long à la Excel
# 1: 1001 1002001  0.9199906       sdf                       45
# 2: 1002 1004004  0.1672665  dfsdfsdf                       65
# 3: 1003 1006009 -0.7392416 sdffdsfsd                       78
# 4: 1004 1008016 -0.9660944 sdfsdfsdf                       98
# 5: 1005 1010025 -0.3047245  dfsdfsdf                       45
# 6: 1006 1012036  0.6368077     fsdfs                        2
# 7: 1007 1014049  0.9928619 sdfsdfsdf                       12
# 8: 1008 1016064  0.4360834  dfsdfsdf                       13
# 9: 1009 1018081 -0.5216282       dff                        4
dt[, ..2]
# Error in `[.data.table`(dt, , ..2) : 
#   Variable '2' is not found in calling scope. Looking in calling scope because you used the .. prefix.
dt[, `..2`]
# Error in `[.data.table`(dt, , ..2) : 
#   Variable '2' is not found in calling scope. Looking in calling scope because you used the .. prefix.
dt[, "..2"]
#        ..2
# 1: 1002001
# 2: 1004004
# 3: 1006009
# 4: 1008016
# 5: 1010025
# 6: 1012036
# 7: 1014049
# 8: 1016064
# 9: 1018081

# Output of sessionInfo()

> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.7.0
LAPACK: /usr/lib/lapack/liblapack.so.3.7.0

locale:
 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C               LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8     LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8    LC_PAPER=fr_FR.UTF-8      
 [8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.12.0 readxl_1.2.0     

loaded via a namespace (and not attached):
 [1] compiler_3.5.2   tools_3.5.2      pillar_1.3.1     rstudioapi_0.9.0 tibble_2.0.1     crayon_1.3.4     Rcpp_1.0.0       cellranger_1.1.0 pkgconfig_2.0.2  rlang_0.3.
@MichaelChirico

This comment was marked as off-topic.

@Atrebas
Copy link

Atrebas commented Feb 6, 2019

fwiw, openxlsx works well (and is more tinyverse-friendly).

library(data.table)
library(openxlsx)
DT <- setDT(openxlsx::read.xlsx("https://chataignon.com/files/read_xl.xlsx"))
DT
#      id      X2         X3      text Nom.très.long.à.la.Excel
# 1: 1001 1002001  0.9199906       sdf                       45
# 2: 1002 1004004  0.1672665  dfsdfsdf                       65
# 3: 1003 1006009 -0.7392416 sdffdsfsd                       78

@jangorecki
Copy link
Member

There is another example of ..colname use in https://github.com/mlr-org/mlr3/blob/adc6768e9817364a969797eba9157c4f2844bc45/R/DataBackendDataTable.R#L141 which uses such name for own internal processing, probably not exposed to user

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants