-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support binary compression in sas7bdat #21
Comments
Compression support is definitely needed! I have to stick to the parso library in Java until this is supported. |
hi, not sure if this is the right place for a minimal reproducible example since tidyverse/haven#31 was closed? the latest version of
|
@evanmiller is this on the schedule? |
No schedule for this -- note that many compression issues were misdiagnosed as binary compression rather than bugs in the character decompressor. Seems like 90%+ compressed files in the wild are character compressed. |
If a truly binary compressed SAS dataset is needed, you may use [https://github.com/reikoch/testfiles/blob/master/binary.sas7bdat]. haven 1.0 fails with "ReadStat: Error parsing page 0, bytes 8192-16383". |
Is this encoding error related to compression ? |
@sclewis23 No - the error is related to the file's character encoding. If you know which encoding was used to create the file, I can try to add support. |
@evanmiller - the encoding is set to "any" |
@evanmiller
|
@evanmiller |
If anyone has example files, please try them with this new code branch: https://github.com/WizardMac/ReadStat/tree/sas-binary-compression |
That is usually the problem! I try to keep track of it here |
I'll create a sample tomorrow. |
tested OK in pyreadstat with the attached sample file I generated in SAS like this:
The file is stored permanently in the pyreadstat repo in the test_data/basic/sample_bincompressed.sas7bdat, for now in the sasbin_dev branch. |
@ofajardo Thanks for testing – I will wait a few days for results from other files and then merge if everything looks okay. |
Here is a very small sample file with binary compression(4 rows). |
@sclewis23 Is this the correct data? "IDnumber","week1","week16","AverageLoss" |
Looks good on my two testfiles dates_binary.sas7bdat and dates_longname_binary.sas7bdat in https://github.com/reikoch/testfiles - congratulations! |
@reikoch Thanks for letting me know. The reports are all positive so I'll get this merged into |
Merged into master and included in 1.1.4 - closing |
Binary (aka Ross) compression is currently not supported. It looks like we can use the Python sas7bdat package as a template for implementing the decompression algorithm:
https://pypi.python.org/pypi/sas7bdat
(The library is MIT licensed.)
At present, compressed files fail with the error message "File has unsupported compression scheme".
The text was updated successfully, but these errors were encountered: