Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

properly parse vectors passed to vintage_dates argument #89

Merged
merged 5 commits into from
Jan 5, 2021

Conversation

sboysel
Copy link
Owner

@sboysel sboysel commented Jan 4, 2021

Addresses #88. If vintage_dates is not a scalar, then the dates are checked then converted to a comma delimited string.

@sboysel sboysel merged commit 21ea900 into master Jan 5, 2021
@sboysel sboysel deleted the fix/vintage-dates branch January 5, 2021 23:14
Copy link
Collaborator

@DavisVaughan DavisVaughan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a little more work here. The series/observations endpoint seems to always return realtime_start and realtime_end columns. At first I thought these were useless, which is why they don't currently show up in the result of fredr() calls, but I think they are actually meaningful when vintage_dates or realtime_start / realtime_end is specified (and possibly at other times, but I'm not sure).

For example:

library(fredr)

# notice that row 1 and 2 are duplicates dates with different values
fredr_series_observations(
  series_id = "GDPC1",
  observation_start = as.Date("2000-01-01"),
  vintage_dates = as.Date(c("2001-01-01", "2002-01-01"))
)
#> # A tibble: 10 x 3
#>    date       series_id value
#>    <date>     <chr>     <dbl>
#>  1 2000-01-01 GDPC1     9192.
#>  2 2000-01-01 GDPC1     9102.
#>  3 2000-04-01 GDPC1     9319.
#>  4 2000-04-01 GDPC1     9229.
#>  5 2000-07-01 GDPC1     9370.
#>  6 2000-07-01 GDPC1     9260.
#>  7 2000-10-01 GDPC1     9304.
#>  8 2001-01-01 GDPC1     9334.
#>  9 2001-04-01 GDPC1     9342.
#> 10 2001-07-01 GDPC1     9310.

# we can see that rows 1 and 2 actually came from different realtime_* intervals!
fredr_request(
  endpoint = "series/observations",
  series_id = "GDPC1",
  observation_start = "2000-01-01",
  vintage_dates = "2001-01-01,2002-01-01"
)
#> # A tibble: 10 x 4
#>    realtime_start realtime_end date       value 
#>    <chr>          <chr>        <chr>      <chr> 
#>  1 2001-01-01     2001-07-26   2000-01-01 9191.8
#>  2 2001-07-27     2002-01-01   2000-01-01 9102.5
#>  3 2001-01-01     2001-07-26   2000-04-01 9318.9
#>  4 2001-07-27     2002-01-01   2000-04-01 9229.4
#>  5 2001-01-01     2001-07-26   2000-07-01 9369.5
#>  6 2001-07-27     2002-01-01   2000-07-01 9260.1
#>  7 2001-07-27     2002-01-01   2000-10-01 9303.9
#>  8 2001-07-27     2002-01-01   2001-01-01 9334.5
#>  9 2001-09-28     2002-01-01   2001-04-01 9341.7
#> 10 2001-12-21     2002-01-01   2001-07-01 9310.4

# even without vintage_dates, we get these realtime_* columns
fredr_request(
  endpoint = "series/observations",
  series_id = "GDPC1",
  observation_start = "2000-01-01"
)
#> # A tibble: 83 x 4
#>    realtime_start realtime_end date       value    
#>    <chr>          <chr>        <chr>      <chr>    
#>  1 2021-01-12     2021-01-12   2000-01-01 12924.179
#>  2 2021-01-12     2021-01-12   2000-04-01 13160.842
#>  3 2021-01-12     2021-01-12   2000-07-01 13178.419
#>  4 2021-01-12     2021-01-12   2000-10-01 13260.506
#>  5 2021-01-12     2021-01-12   2001-01-01 13222.69 
#>  6 2021-01-12     2021-01-12   2001-04-01 13299.984
#>  7 2021-01-12     2021-01-12   2001-07-01 13244.784
#>  8 2021-01-12     2021-01-12   2001-10-01 13280.859
#>  9 2021-01-12     2021-01-12   2002-01-01 13397.002
#> 10 2021-01-12     2021-01-12   2002-04-01 13478.152
#> # … with 73 more rows

# this gives all the adjustments to `value` that occurred between `realtime_start`
# and `realtime_end`. for example, the GDPC1 value on 1947-01-01 has been adjusted
# 9 times since 2000-01-01!
fredr_request(
  endpoint = "series/observations",
  series_id = "GDPC1",
  realtime_start = "2000-01-01",
  realtime_end = "2020-01-01"
)
#> # A tibble: 2,682 x 4
#>    realtime_start realtime_end date       value   
#>    <chr>          <chr>        <chr>      <chr>   
#>  1 2000-01-01     2000-04-26   1947-01-01 .       
#>  2 2000-04-27     2003-12-09   1947-01-01 1481.7  
#>  3 2003-12-10     2009-07-30   1947-01-01 1570.5  
#>  4 2009-07-31     2011-07-28   1947-01-01 1772.2  
#>  5 2011-07-29     2013-07-30   1947-01-01 1770.7  
#>  6 2013-07-31     2014-07-29   1947-01-01 1932.6  
#>  7 2014-07-30     2017-10-26   1947-01-01 1934.5  
#>  8 2017-10-27     2018-07-26   1947-01-01 1934.471
#>  9 2018-07-27     2020-01-01   1947-01-01 2033.061
#> 10 2000-01-01     2000-04-26   1947-04-01 .       
#> # … with 2,672 more rows

Created on 2021-01-12 by the reprex package (v0.3.0.9001)

With this in mind, I think we should always return the realtime_start and realtime_end columns, and parse them as Date too. I think we could rearrange them to be after the value column. That seems to make the most sense to me.

I can take a stab at this

@@ -1,6 +1,6 @@
Package: fredr
Title: An R Client for the 'FRED' API
Version: 2.0.0.9000
Version: 2.0.1.9000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you needed to bump the version number. The cran version is 2.0.0, and 2.0.0.9000 is after that version. i.e. it is between 2.0.0 and 2.0.1 (or 2.1.0, whatever it gets bumped to for the next release)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will go ahead and bump this back down!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants