Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Page size error when parsing sas7bdat file #226

Closed
curtisalexander opened this issue Dec 21, 2020 · 8 comments
Closed

Page size error when parsing sas7bdat file #226

curtisalexander opened this issue Dec 21, 2020 · 8 comments
Labels

Comments

@curtisalexander
Copy link

Issue

I receive the following error when parsing a rather large sas7bdat file.

Format: SAS data file (SAS7BDAT)
ReadStat: Error parsing page 32767, bytes 0-131071
Error processing .\_rand_ds.sas7bdat: Invalid file, or file has unsupported features

Dataset

The dataset I am using for testing has 3,800,000 rows and 110 columns. Of greater import is that it has 33,195 (i.e. > 32,767) pages if I run a proc contents on the file from within SAS. I can take the same dataset and cut it down so that it has < 32,767 pages and I can parse without issue.

OS

I get the above error only when I run on 64-bit Windows (x86 processor). I built the executable — ReadStat_App.exe — using Visual Studio 19 and the newly added Visual Studio solution for 1.1.5.

If I build on 64-bit Linux (x86 processor), I can parse the file without error. To me this suggests a challenge with macOS / Linux C integer sizes vs. Windows C integer sizes.

Troubleshooting

Note that I'm glad to provide the raw dataset I'm using for testing (it is ~ 4.6GB in size). Alternatively I can provide the SAS program I utilized to generate the dataset if you have access to SAS (the program just creates random data to produce a large file).

Or if I can assist by simply rebuilding and testing from a different commit, I'm glad to do so.

@curtisalexander curtisalexander changed the title Page size error When Parsing sas7bdat file Page size error when parsing sas7bdat file Dec 21, 2020
@evanmiller
Copy link
Contributor

Hi, thanks for the report. Do you see the errors described in #225 when compiling on Windows? Given the magic 4 GB file size I am guessing there is an overflowing 32-bit variable somewhere.

@curtisalexander
Copy link
Author

Yes, I see the warnings in #225.

I see a slew of other warnings as well. That could be because I am utilizing VS19 or it could be my error as I don't ever work with VS.

@evanmiller
Copy link
Contributor

If you look at the Appveyor log linked in that issue, you will see many signed/unsigned warnings, among others.

See if this branch fixes things:

https://github.com/WizardMac/ReadStat/tree/windows-largefile

@curtisalexander
Copy link
Author

That branch fixes things! I can read the large file now. I tested both displaying the sas7bdat metadata as well as reading thesas7bdat and subsequently writing the file out as a csv.

And that was a super quick turnaround!

@curtisalexander
Copy link
Author

Do you know roughly when you would cut a new release with these updates? Or would you move these updates into another branch (say a dev branch)? Just curious as I'm ultimately binding to the C library and am bringing in your (and other ReadStat contributor's) work in as a git submodule. Just trying to figure out which branch I should be looking to bring in.

Thanks again for the assistance!

@evanmiller
Copy link
Contributor

@curtisalexander Excellent news!

Generally I do releases about every 3 months with the accumulated changes, with a 1 month beta period prior. master always reflects the most recent release.

I'll be moving these changes into dev once I get some other Windows build stuff resolved (hopefully tonight). So I'd point to dev if you want the latest and greatest and don't mind if things break now and then.

@curtisalexander
Copy link
Author

Sounds good - thank you for the clarification!

@curtisalexander
Copy link
Author

Closing - fixed by dc76fb1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants