Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

label_eurostat function returns '404 error Not Found' #297

Open
sweingwifo opened this issue Feb 28, 2024 · 5 comments
Open

label_eurostat function returns '404 error Not Found' #297

sweingwifo opened this issue Feb 28, 2024 · 5 comments

Comments

@sweingwifo
Copy link

Description

The label_eurostat function is currently failing and returns a 404 error Not Found, suggesting there could be an issue with incorrectly specified paths in the latest update of the Eurostat package.

Steps to reproduce

  1. Install the latest version of the Eurostat package with: install.packages("eurostat").
  2. Run the following R code:

label_eurostat("nama_10_gdp", dic = "table_dic")

Expected Behavior

The function is expected to retrieve the labels for the dataset without any errors.

Actual Behavior

The function call results in a 404 error Not Found.

@pitkant
Copy link
Member

pitkant commented Feb 28, 2024

Thank you for opening this issue. The issue seems to be more about outdated / badly worded documentation than actual error. By following the function example, i.e. by using a data object as input to argument x and leaving argument dic as NULL (If NULL (default) dictionary names taken from column names of the data_frame):

lp <- get_eurostat("nama_10_gdp")
lpl <- label_eurostat(lp)

things worked fine for me. If you have any other errors please don't hesitate to bring them up here or open another issue.

We will clarify the documentation in the next update.

@pitkant pitkant self-assigned this Feb 28, 2024
@sweingwifo
Copy link
Author

Thank you for your prompt response and for looking into the issue.

Upon further assessment, I have noted that while labeling of datasets does work as described when passing a data object to the label_eurostat function, there seems to be a change in the behavior compared to previous versions regarding the use of a string input.

In the past, it was possible to use the function by directly providing a string representing a eurostat code.

However, this functionality no longer appears to work in the current version, resulting in the '404 error Not Found'. This change has impacted certain workflows that relied on string inputs for dataset labeling.

Would it be possible to reinstate this feature, or should we adjust our workflows to only use data objects as input?

@pitkant
Copy link
Member

pitkant commented Feb 28, 2024

Eurostat removed old "dictionaries" (.dic files) when the old bulk download service was decommissioned. These were basically just lookup tables that had the code in the 1st column and the definition (label) on the 2nd column. The alternative to using this big list is described in Eurostat document "API - Migrating from Bulk Download Listing urls to API urls: "Retrieve all dataflows "stubs"(only references and title) in XML".

While by modern standards it's not a lot, I was a bit hesitant on writing software that relies on fetching a 6.5 Mb XML file or 4.2 Mb .tsv file for such a simple lookup operation. At least the old table_dic.dic file had the decency of being only ~620 Kb. Many datasets are of course much bigger than 6.5 Mb so the additional traffic to Eurostat wouldn't probably be that big of an issue... And anyway there are other options to implement this feature than to download all the variable labels, such as fetching metadata based on dataset name as it was done before.

Although, are you certain that label_eurostat_tables("nama_10_gdp", lang = "en") wouldn't serve the purpose? Judging by the contents of table_dic it just returns the name (label) of the dataset?

Here's the old table_dic file for reference:
table_dic.dic.zip

@sweingwifo
Copy link
Author

Thanks for the detailed explanation and the context.

The label_eurostat_tables function does indeed work for retrieving the name of a single dataset, and it serves my purpose for individual codes. I apologize for any confusion; my use case actually involves working with a vector of dataset codes, which is why I used the functionality that accepts a string vector input. The help file suggests that vectors can be used as input, it seems that this is not currently supported.

While it's not as convenient, I can iterate over my vector of dataset codes using the label_eurostat_tables function to get the labels. This will be a bit more time consuming than the previous method, but it is a workaround.

Thank you again for your support and for considering the reinstatement/adjustment of this feature.

@pitkant
Copy link
Member

pitkant commented Feb 28, 2024

While it's not as convenient, I can iterate over my vector of dataset codes using the label_eurostat_tables function to get the labels. This will be a bit more time consuming than the previous method, but it is a workaround.

I don't think my way of solving this would be much different from what you describe here. If you have the codes already in a vector it's relatively straightforward to label them e.g. by using sapply:

codes <- c("NAMA_10_GDP", "NAMA_10_LP_A21", "NAMA_10_FTE")
names <- sapply(codes, label_eurostat_tables)
> names
                                                  NAMA_10_GDP 
   "GDP and main components (output, expenditure and income)" 
                                               NAMA_10_LP_A21 
"Labour productivity and unit labour costs at industry level" 
                                                  NAMA_10_FTE 
             "Average full time adjusted salary per employee" 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants