Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

haven::read_sas unable to allocate memory for 16MB SAS page sizes #697

Closed
inpowell opened this issue Nov 23, 2022 · 1 comment · Fixed by #713
Closed

haven::read_sas unable to allocate memory for 16MB SAS page sizes #697

inpowell opened this issue Nov 23, 2022 · 1 comment · Fixed by #713
Labels

Comments

@inpowell
Copy link

haven::read_sas cannot read a SAS data file with page size 16 MiB (16777216 bytes). Some data files with sizes slightly under 16 MiB also fail to read.

I would expect the attached sas7bdat files (which I have zipped to keep filesize under 10MB) to be read in by haven::read_sas and give a 10,000 row tibble with one column empty consisting only of empty strings.

20221123 - haven bug report.zip

haven::read_sas('test_16766976.sas7bdat') # Succeeds
# # A tibble: 10,000 x 1
# empty
# <chr>
#   1 ""   
# 2 ""   
# 3 ""   
# 4 ""   
# 5 ""   
# 6 ""   
# 7 ""   
# 8 ""   
# 9 ""   
# 10 ""   
# # ... with 9,990 more rows

haven::read_sas('test_16776192.sas7bdat') # Fails
# Error in df_parse_sas_file(spec_data, spec_cat, encoding = encoding, catalog_encoding = catalog_encoding,  : 
#                              Failed to parse <snip>/test_16776192.sas7bdat: Unable to allocate memory.
                           
haven::read_sas('test_16777216.sas7bdat') # Fails
# Error in df_parse_sas_file(spec_data, spec_cat, encoding = encoding, catalog_encoding = catalog_encoding,  : 
#                              Failed to parse <snip>/test_16777216.sas7bdat: Unable to allocate memory.

I generated these files in SAS using

* libname out "appropriate/path/here";

data out.test_16777216 (bufsize=16777216 compress=no);
length empty $3000;
do i = 1 to 10000; output; end;
drop i;
run;
* PROC CONTENTS to verify page size;
proc contents data=out.test_16777216 varnum; run;

data out.test_16776192 (bufsize=16776192 compress=no);
length empty $3000;
do i = 1 to 10000; output; end;
drop i;
run;
proc contents data=out.test_16776192 varnum; run;

data out.test_16766976 (bufsize=16766976 compress=no);
length empty $3000;
do i = 1 to 10000; output; end;
drop i;
run;
proc contents data=out.test_16766976 varnum; run;

Workaround: Set the default page size in SAS to 8MB with -BUFSIZE 8M or on a case-by-case basis. The default page size for my operating environment is 16M.

@gorcha
Copy link
Member

gorcha commented Nov 24, 2022

Hi @inpowell, thanks for the bug report.

There's a hard limit to SAS page size in ReadStat, the underlying C library, to avoid memory allocation issues with malformed SAS input (see WizardMac/ReadStat#249 for details).

I've opened an issue over at ReadStat to see if we can get the maximum size increased.

gorcha added a commit that referenced this issue Feb 21, 2023
Maintains iconv hack from c1f9f19 and solaris hack from 4a878a1.

* Fix various SAS catalog file reading bugs (fix #529, fix #653, fix #680, fix #696, fix #705).
* Increase maximum SAS page file size to 16MB (fix #697).
* Ignore invalid SAV timestamp strings (fix #683).
* Fix compiler warnings (fix #707).
@gorcha gorcha closed this as completed in 196e8eb Feb 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants