Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_sas with catalog_file resulted in failed to parse error #653

Closed
Sama2than opened this issue Nov 29, 2021 · 3 comments · Fixed by #713
Closed

read_sas with catalog_file resulted in failed to parse error #653

Sama2than opened this issue Nov 29, 2021 · 3 comments · Fixed by #713
Labels
bug an unexpected problem or unintended behavior readstat

Comments

@Sama2than
Copy link

Sama2than commented Nov 29, 2021

I'm new to R and Haven. I'm working with data from NHTSA. NHTSA provides SAS data files and both 32-bit and 64-bit catalog files.

I am trying to use Haven to import SAS files. The read_sas function works well if no catalog file is specified. If I specify a catalog file, I get an error: "Error: Failed to parse xxx/formats_32.sas7bcat: Unable to allocate memory." This occurs using both the 32-bit and 64-bit catalog files.

I tried to put together a reprex but get an error when running the reprex: "Error: pandoc document conversion failed with error 1". I haven't solved that one, so I will paste the code I'm using to see if any can assist as is.

I'm using:

  • RStudio 2021.09.1 Build 372
  • R 4.1.2
  • Haven 2.4.3

In the code below, I renamed the sas7bcat files from the supplied files to keep track of whether it is 32 or 64 bit.

library(haven)
gv <- read_sas("gv.sas7bdat")
## This successfully produces a data frame.
gv_32 <- read_sas(data_file = "gv.sas7bdat", catalog_file = "formats_32.sas7bcat")
##This results in error "Error: Failed to parse C:/.../formats.sas7bcat: Unable to allocate memory."

gv
# A tibble: 4,982 x 105
   CASEID   PSU CASENO CASENUMBER   CATEGORY VEHNO VIN    VINLENGTH  MAKE MODEL MODELYR BODYTYPE BODYCAT
    <dbl> <dbl>  <dbl> <chr>           <dbl> <dbl> <chr>      <dbl> <dbl> <dbl>   <dbl>    <dbl>   <dbl>
 1  12475    10      1 1-10-2019-0~       10     1 2G1WF~        17    20     2    2002        4       1
 2  12475    10      1 1-10-2019-0~       10     2 1G2WP~        17    22    20    1999        2       1
 3  12579    10      2 1-10-2019-0~        9     1 1GNDU~        17    20   443    2003       20       4
 4  12579    10      2 1-10-2019-0~        9     2 KMHDN~        17    55    35    2002        4       1
 5  12725    10      3 1-10-2019-0~        6     1 1G6KS~        17    19    14    2004        4       1
 6  12725    10      3 1-10-2019-0~        6     2 3FADP~        17    12    32    2014        4       1
 7  12726    10      4 1-10-2019-0~        7     1 4T1BF~        17    49    40    2013        4       1
 8  12805    10      5 1-10-2019-0~        4     1 1G1PE~        17    20    25    2016        4       1
 9  12898    10      6 1-10-2019-0~        4     1 5TDJK~        17    49   403    2016       14       3
10  12898    10      6 1-10-2019-0~        4     2 1FTYR~        17    12   471    2008       34       5
# ... with 4,972 more rows, and 92 more variables: VEHCLASS <dbl>, SPECUSE <dbl>, TRANSTAT <dbl>,
#   DAMPLANE <chr>, DAMSEV <dbl>, CURBWT <dbl>, CURBSRC <dbl>, CARGOWT <dbl>, CARGOSRC <dbl>,
#   INSPTYPE <dbl>, INSPLAG <dbl>, TOWED <dbl>, SPEEDLIMIT <dbl>, DRPRESENT <dbl>, PARALCOHOL <dbl>,
#   ALCTEST <dbl>, ALCTESTRESULT <dbl>, ALCTESTSRC <dbl>, PARDRUG <dbl>, DRUGTEST <dbl>, ZIP <chr>,
#   RACE <dbl>, ETHNICITY <dbl>, RELTOJUNCT <dbl>, TRAFFLOW <dbl>, RDLANES <dbl>, INITLANE <dbl>,
#   SURFTYPE <dbl>, SURFCOND <dbl>, ALIGNMENT <dbl>, PROFILE <dbl>, LINERIGHT <dbl>, LINELEFT <dbl>,
#   RUMBINIT <dbl>, RUMBROAD <dbl>, LIGHTCOND <dbl>, WEATHER <dbl>, TRAFDEV <dbl>, TRAFFUNCT <dbl>, ...
@elipousson
Copy link

I encountered the same issue on OSX. By random coincidence, I was actually trying to access the FARS data from NHTSA just today. It may be helpful to replicate the issue with some other data source to make sure this isn't specific to how NHTSA generates the SAS files.

@gorcha gorcha added bug an unexpected problem or unintended behavior readstat labels Dec 6, 2021
@jacciz
Copy link

jacciz commented Dec 8, 2021

Oddly enough, my issue is also with a crash database (though my state's and not from NHTSA). I am able to use the catalog file, but the labels are not applied to all columns.

@elipousson
Copy link

I know this is off-topic but, for @jacciz and @Sama2than, you may be interested in the crashapi R package I'm working on and a related effort to make an index of open crash data sources (currently an open Google Sheet). Feel free to ping me on Twitter if you want more details or open an issue on the crashapi repo for discussion. Glad to connect with other folks using R for crash data analysis even in the comments of a GitHub issue!

gorcha added a commit that referenced this issue Feb 21, 2023
Maintains iconv hack from c1f9f19 and solaris hack from 4a878a1.

* Fix various SAS catalog file reading bugs (fix #529, fix #653, fix #680, fix #696, fix #705).
* Increase maximum SAS page file size to 16MB (fix #697).
* Ignore invalid SAV timestamp strings (fix #683).
* Fix compiler warnings (fix #707).
@gorcha gorcha closed this as completed in 196e8eb Feb 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior readstat
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants