Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decompress Chunk Truncated error #99

Open
mayurpande opened this issue Dec 4, 2020 · 20 comments
Open

Decompress Chunk Truncated error #99

mayurpande opened this issue Dec 4, 2020 · 20 comments

Comments

@mayurpande
Copy link

I have tried to follow the readme and write this line from my REPL like so:

python -m snappy -d temp.snappy temp.txt

However I get the error UncompressError: chunk truncated

@mayurpande
Copy link
Author

Also when I try to use it within my script it fails saying:

snappy.UncompressError: Error while decompressing: invalid input

However I have a bytes like object:

    with open('libs/temp.snappy', 'rb') as f:
        data = f.read()
        snappy.uncompress(data)

@martindurant
Copy link
Member

how did you make the file?
Note the difference between decompress and stream_decompress.

@mayurpande
Copy link
Author

It is compressed in S3, then I download it.

@martindurant
Copy link
Member

It is compressed in S3

How was is created?

@mayurpande
Copy link
Author

It is compressed in S3

How was is created?

It is fed through Firehose and then the Firehose handles the compression.

@martindurant
Copy link
Member

So, this library has three decompress functions, you should try each. Otherwise, you will need to get the detail of what firehose is doing for you. This isn't a parquet file, right?

@mayurpande
Copy link
Author

So, this library has three decompress functions, you should try each. Otherwise, you will need to get the detail of what firehose is doing for you. This isn't a parquet file, right?

No, it isn't parquet. As per the information for Firehose it says

S3 compression and encryption
Kinesis Data Firehose can compress records before delivering them to your S3 bucket. Compressed records can also be encrypted in the S3 bucket using a KMS master key.

So I have tried uncompress and stream_compress and both do not work, I will try decompress now.

@martindurant
Copy link
Member

Kinesis Data Firehose can compress records before delivering them to your S3 bucket.

Sorry, that doesn't give us much to work from. Also, don't forget hadoop_stream_decompress.

@mayurpande
Copy link
Author

Sorry, that doesn't give us much to work from. Also, don't forget hadoop_stream_decompress.

It just means that is compresses into a specified format, with the ability to choose from:

  • Disabled
  • GZIP
  • Snappy
  • Zip
  • Hadoop-Compatible Snappy

@mayurpande
Copy link
Author

Also, don't forget hadoop_stream_decompress.

    with open('libs/temp.snappy', 'rb') as f:
        data = f.read()
        decom = snappy.hadoop_snappy.StreamDecompressor()
        un = decom.decompress(data)

This is the only thing that didn't throw an error, however it returns an empty bytes string. But when I use the mac snzip command line tool it uncompresses the file.

@mayurpande
Copy link
Author

I have tried to follow the readme and write this line from my REPL like so:

python -m snappy -d temp.snappy temp.txt

However I get the error UncompressError: chunk truncated

Also there is still the issue of this. Not too sure why all methods are not working, but as have mentioned I am able to use snzip from my command line.

@martindurant
Copy link
Member

So I have tried uncompress and stream_compress

You meant stream_decompress ?
Sounds like that should be the one, guessing from the snzip readme.

@mayurpande
Copy link
Author

mayurpande commented Dec 4, 2020

You meant stream_decompress ?

Yes, it still doesn't work, throws me

snappy.UncompressError: stream missing snappy identifier I tried to do what they done here but still did not work.

Sounds like that should be the one, guessing from the snzip readme.

I am just going to use a subprocess call to snzip as that works. If there is a fix to this, please let me know.

@martindurant
Copy link
Member

Sorry, I don't have any more suggestions for you. Perhaps someone else does.

@mayurpande
Copy link
Author

Sorry, I don't have any more suggestions for you. Perhaps someone else does.

No problem, thank you for the guidance anyways. My colleague was having the same issue with chunk truncated as well.

@mayurpande
Copy link
Author

Anyone with any ideas for this?

@randomtask2000
Copy link

I have this issue as well. Any feedback is appreciated.

@martindurant
Copy link
Member

You might want to try the package cramjam, which has cramjam.snappy.decompress and cramjam.snappy.decompress_raw for the framed and unframed formats, respectively. I don't believe it has a CLI, but you could request one.

@omendoza-itera
Copy link

@mayurpande

Sorry to bother you, but I ave the same problem trying to read Kinesis Firehose snappy file.
Did you find a way to uncompress it?

Regards

@milesgranger
Copy link

I don't believe it has a CLI, but you could request one.

Just released one, pip install cramjam-cli for anyone interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants