Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get eurostat freezing #98

Closed
antagomir opened this issue Sep 11, 2017 · 13 comments
Closed

get eurostat freezing #98

antagomir opened this issue Sep 11, 2017 · 13 comments

Comments

@antagomir
Copy link
Member

Since a week someone had problems with downloading the daily exchange rates file: ert_bil_eur_d It downloads the file, but then seems to freeze (I had it running for several hours but it looks like an infinite loop J ), This is the R code:

library(eurostat)
x  <- get_eurostat("ert_bil_eur_d", time_format="raw", cache=F)

The problem started last week and it could be the result of some package updates.

@antagomir
Copy link
Member Author

For me this works fine. Could it be that eurostat service itself has ome issues, rather than the package?

@WietseDol
Copy link

Running the command I see this

 > theData=get_eurostat(code, time_format="raw",cache=F)
trying URL 'http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Fert_bil_eur_d.tsv.gz'
Content type 'application/octet-stream;charset=UTF-8' length 663612 bytes (648 KB)
downloaded 648 KB

|=================================================================================| 100%    2 MB
                                                                  |  22%

It hangs after getting the data, in something I would guess it is readr or dplyr

@jhuovari
Copy link

It freeze for me also. Could be dplyr update.

@jhuovari jhuovari added the bug label Sep 11, 2017
@jhuovari
Copy link

The problem seems to be in tidy_eurostat:

    dat2 <- tidyr::gather_(dat2, cnames2, "values",
                           names(dat2)[!(names(dat2) %in% cnames1)],
                           convert = FALSE, na.rm = TRUE)

It could be that there are just too many columns, more than 11 000. I could not find quick solution.

@jhuovari
Copy link

It seems to work with tidyr::gather. So, i try to fix it with non-standard evaluation.

@WietseDol
Copy link

WietseDol commented Sep 12, 2017 via email

@georgeblck
Copy link

I am having the same problem. It hangs after getting the data, so it must be the tidy::gather?

Here is the code to reproduce:

library(eurostat)
get_eurostat("migr_asyappctzm", type = "label", time_format = "num", stringsAsFactors = FALSE)

@WietseDol
Copy link

I just tried it, R 4.02 and the latest dplyr 1.0.2. It takes a few minutes but is succeeds, there is a warning...

get_eurostat("migr_asyappctzm", type = "label", time_format = "num", stringsAsFactors = FALSE)
trying URL 'https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Fmigr_asyappctzm.tsv.gz'
Content type 'application/octet-stream;charset=UTF-8' length 7808372 bytes (7.4 MB)
downloaded 7.4 MB

|========================================================================================================================================================| 100% 219 MB
Table migr_asyappctzm cached at C:\Users\Wiets\AppData\Local\Temp\RtmpcZNLwU/eurostat/migr_asyappctzm_num_label_FF.rds

A tibble: 63,798,248 x 8

unit citizen sex age asyl_app geo time values

1 Person Andorra Females Total Asylum applicant Belgium 2021. 0
2 Person Andorra Females Total Asylum applicant Bulgaria 2021. 0
3 Person Andorra Females Total Asylum applicant Switzerland 2021. 0
4 Person Andorra Females Total Asylum applicant Germany (until 1990 former territory of the FRG) 2021. 0
5 Person Andorra Females Total Asylum applicant Denmark 2021. 0
6 Person Andorra Females Total Asylum applicant Spain 2021. 0
7 Person Andorra Females Total Asylum applicant European Union - 27 countries (from 2020) 2021. 0
8 Person Andorra Females Total Asylum applicant Finland 2021. 0
9 Person Andorra Females Total Asylum applicant France 2021. 0
10 Person Andorra Females Total Asylum applicant Hungary 2021. 0

... with 63,798,238 more rows

Warning message:
In label_eurostat(y[[i]], i, eu_order = eu_order, lang = lang, countrycode = countrycode, :
All labels for citizen were not found.

@antagomir
Copy link
Member Author

Thanks! This should be sorted out as soon as the situation allows. PRs also welcome.

@antagomir antagomir reopened this Oct 7, 2020
@jhuovari
Copy link

jhuovari commented Oct 7, 2020

I don't think there is a problem in package. The migr_asyappctzm is just really big dataset. The warning comes from NA codes in citizen. The label_eurostat doesn't seems to handle those.

@georgeblck
Copy link

Thanks for the quick feedback from all of you. Loading migr_asyappctzm does indeed work after more than five minutes.
However, migr_asydcfstq definitely crashes R for me with the following code:

library(eurostat)
accept <- get_eurostat("migr_asydcfstq", type = "label", time_format = "num", stringsAsFactors = FALSE)

For me personally, a small warning message would be very helpful that a chosen big dataset can take a long time to process, get_eurostat is already quite verbose anyways.
When using certain eurostat codes without filters, it is not immediately obvious that some are huge and some not.

@jhuovari
Copy link

jhuovari commented Oct 7, 2020

That is even bigger. You can try to get it first without labels, and label afterwards, and turn off caching, cache = FALSE.

@georgeblck
Copy link

Leaving out the labels made it possible to download the data set without crashing R. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants