-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to parse sas7bdat when data set page size > ~1MiB #249
Comments
Thanks for the detailed report. The purpose of the page and header size checks is to prevent excessive memory allocations with malformed input – right now ReadStat allocates a buffer equal to the page size, and so we don't want that running into the gigabytes. I'll add some slack to the header size test so it matches the page size test – |
Thanks so much @evanmiller! Completely understand not wanting to allocate a buffer in the GBs. I have seen SAS datasets in the wild with a data set page size of 10 MB. If I encounter others that are larger I will post another issue. |
Closing as f6766cd corrects. Will re-open new issue if encounter if encounter page size greater than 16MB. |
Issue
I am attempting to parse a
sas7bdat
with a data set page size of2097152
(~ 2 MiB). When attempting to parse this file I get the error:SAS file
As noted, the data set page size of the file is
2091752
. The SAS dataset is a randomly created dataset with 2,000 rows and 110 created columns and is ~6MB in size.I can regenerate the same file, reducing the page size from
2091752
(~ 2 MiB) down to1048576
(~ 1MiB) and the file parses without issue.The files I used for testing are in the following location:
To generate the tables linked to above, I manually adjusted using the following in SAS.
SAS
BUFSIZE
optionAccording to the SAS documentation on the
BUFSIZE
option, the page size may be adjusted by altering the system or datasetBUFSIZE
. Again from the documentation, the maximum data set page size that may be set is2147483647
.Troubleshooting / Potential Fix
On line 257 of the
readstat_sas.c
file I note thehinfo->header_size
is checked against1<<20
(1048576 in decimal). If I alter this line to check against1<<21
(2097152 in decimal), the file parses without issue.Because the SAS documentation notes that users can set the data set page size to as much as ~ 4GiB, I wonder if the line should be adjusted to check against
INT32_MAX
. Obviously making the adjustment may have ramifications that I don't immediately observe as I'm not extremely familiar with the repository.Finally, I am glad to submit a PR with the change I noted above. Or if you have suggestions on a set of other changes I would need to trace through, I am glad to put in the work. All in all, I am glad to help in any way! Thanks so much for all the effort you (and others) have put into the library!
The text was updated successfully, but these errors were encountered: