-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Namibia is missing in some coding schemes #261
Comments
Thanks! and, confirmed... library(countrycode)
is.na(countrycode("NAM", "iso3c", "eurostat"))
#> Warning in countrycode("NAM", "iso3c", "eurostat"): Some values were not matched unambiguously: NAM
#> [1] TRUE
is.na(countrycode("NAM", "iso3c", "ecb"))
#> Warning in countrycode("NAM", "iso3c", "ecb"): Some values were not matched unambiguously: NAM
#> [1] TRUE
is.na(countrycode("NAM", "iso3c", "eu28"))
#> Warning in countrycode("NAM", "iso3c", "eu28"): Some values were not matched unambiguously: NAM
#> [1] TRUE
is.na(countrycode("NAM", "iso3c", "genc2c"))
#> Warning in countrycode("NAM", "iso3c", "genc2c"): Some values were not matched unambiguously: NAM
#> [1] TRUE
is.na(countrycode("NAM", "iso3c", "wb_api2c"))
#> Warning in countrycode("NAM", "iso3c", "wb_api2c"): Some values were not matched unambiguously: NAM
#> [1] TRUE |
I would say that this issue is not fully "fixed" until the scrapers for each of these codes has been fixed. Maybe each should be split into its own issue so that they can be addressed separately? |
Also of note... now that this tidyverse/rvest/issues/107 has finally been resolved, we can probably make dictionary/get_ecb.R work better. |
on the other hand, a similar issue in jsonlite is still unresolved, so still requires workarounds... |
I'm not sure the problem is (entirely) related to our scrapers. It seems reader related to me. For instance, the "NA" string in https://github.com/vincentarelbundock/countrycode/blob/main/dictionary/data_genc.csv The setwd("~/repos/countrycode")
library(readr)
library(data.table)
# Base R
x = read.csv("dictionary/data_genc.csv")
"NA" %in% x$genc2c
#> [1] FALSE
# tidyverse
y = read_csv("dictionary/data_genc.csv")
"NA" %in% y$genc2c
#> [1] FALSE
# data.table
z = fread("dictionary/data_genc.csv")
"NA" %in% z$genc2c
#> [1] TRUE |
I made a minor commit with:
Obviously, if the saved data is not properly double-quoted, we should fix the scraper, but I'd like to get to the bottom of the |
An even more minimal example: library(readr)
library(data.table)
csv <- 'x,y
"1","NA"
"NA","2"'
str(read_csv(csv))
#> spec_tbl_df [2 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
#> $ x: num [1:2] 1 NA
#> $ y: num [1:2] NA 2
#> - attr(*, "spec")=
#> .. cols(
#> .. x = col_double(),
#> .. y = col_double()
#> .. )
str(fread(csv))
#> Classes 'data.table' and 'data.frame': 2 obs. of 2 variables:
#> $ x: chr "1" "NA"
#> $ y: chr "NA" "2"
#> - attr(*, ".internal.selfref")=<externalptr> |
Maybe we just set |
Sorry for the multiple comments, but I pushed a change to add a bunch of I think we're good to close, but it would be great if either of you could make sure the github version works locally. |
Hi! So now it seems that some real I was not able to check it locally yet, but as per my limited knowledge of the package, I guess that if checks are passed it’s because those new I wonder if it could be possible to add a extra sanity check on Does it make any sense? |
That would be more explicit, but library(countrycode)
countrycode("NA", "genc2c", "country.name")
"Namibia"
countrycode(NA, "genc2c", "country.name")
Error in countrycode(NA, "genc2c", "country.name") :
sourcevar must be a character or numeric vector. This error often
arises when users pass a tibble (e.g., from dplyr) instead of a
column vector from a data.frame (i.e., my_tbl[, 2] vs. my_df[, 2]
vs. my_tbl[[2]]) The error is not super informative, I'll admit that ;) |
The "proper" way to deal with this in readr::read_csv('x,y\n"US","NA"\n"NA","DE"')
#> # A tibble: 2 x 2
#> x y
#> <chr> <chr>
#> 1 US <NA>
#> 2 <NA> DE
readr::read_csv('x,y\n"US","NA"\n"NA","DE"', na = "")
#> # A tibble: 2 x 2
#> x y
#> <chr> <chr>
#> 1 US NA
#> 2 NA DE |
@cjyetman this is exactly what I did everywhere in my new commit. |
One thing maybe I didn’t explain well is that the only scrapper that was not working properly was |
Also pay attention to where the CSVs are being written ( readr::read_csv('x,y\n"NA","NA"\nNA,NA\n,')
#> # A tibble: 3 x 2
#> x y
#> <lgl> <lgl>
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
# all are converted to <NA>s
readr::read_csv('x,y\n"NA","NA"\nNA,NA\n,', na = "")
#> # A tibble: 3 x 2
#> x y
#> <chr> <chr>
#> 1 NA NA
#> 2 NA NA
#> 3 <NA> <NA>
# only the last row is converted to <NA>s
data <- readr::read_csv('x,y\n"NA","NA"\nNA,NA\n,', na = "")
readr::format_csv(data)
#> [1] "x,y\n\"NA\",\"NA\"\n\"NA\",\"NA\"\nNA,NA\n"
readr::format_csv(data, na = "")
#> [1] "x,y\nNA,NA\nNA,NA\n,\n" technically, a string should not be quoted unless it's necessary string <- 'x,y\n"NA","NA"\nNA,NA\n,'
readr::format_csv(readr::read_csv(string))
#> [1] "x,y\nNA,NA\nNA,NA\nNA,NA\n"
readr::format_csv(readr::read_csv(string, na = ""))
#> [1] "x,y\n\"NA\",\"NA\"\n\"NA\",\"NA\"\nNA,NA\n"
readr::format_csv(readr::read_csv(string, na = ""), na = "")
#> [1] "x,y\nNA,NA\nNA,NA\n,\n" |
I think that's the best thing to do... but again, just be careful that if any CSVs are written that they don't write |
Yes, I added I think this is fixed. Feel free to reopen or comment if it still fails after reinstall from GH |
better example of why you need to be careful of both ends of the round trip... my_csv <- readr::format_csv(data.frame(x = c("A", NA), y = c(NA, "B")))
readr::read_csv(my_csv, na = "")[[1]]
#> [1] "A" "NA"
# BAD
my_csv <- readr::format_csv(data.frame(x = c("A", NA), y = c(NA, "B")), na = "")
readr::read_csv(my_csv, na = "")[[1]]
#> [1] "A" NA
# GOOD if |
Again Namibia. I have realised that in four coding schemes (
eurostat, genc2c, wb_api2c, ecb
) is missing since in all of them the 2-letter code isNA
. See sources:get_eurostat
):countrycode/dictionary/get_eurostat.R
Line 4 in 75e3263
Reprex with the latest CRAN release
Created on 2021-02-10 by the reprex package (v0.3.0)
Reprex after PR
Created on 2021-02-10 by the reprex package (v1.0.0)
Now only
eu28
is missing, that it is ok (I leave out of the exercise thecldr*
fields for clarity).I have prepared a PR that hopefull fixes this issue,
Regards
The text was updated successfully, but these errors were encountered: