Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream to stream expansion #36

Open
wimmatthijs opened this issue Apr 14, 2021 · 10 comments
Open

Stream to stream expansion #36

wimmatthijs opened this issue Apr 14, 2021 · 10 comments
Labels
enhancement New feature or request

Comments

@wimmatthijs
Copy link

wimmatthijs commented Apr 14, 2021

Hi,

pretty cool library here, kudoz and thanks.
Could a stream to stream unpacking be envisioned?
the main idea being unpacking an incoming httpstream which will be gzipped content, and immediately "consuming" that content.
For example a json file, you just consume only the data you find for your specific keyword and the rest would be ditched.
I would love to collaborate to implement this functionality if it is possible ?
Let me know your thoughts, and i'm here if you would like to discuss

Wim

@tobozo
Copy link
Owner

tobozo commented Apr 14, 2021

hi, thanks for your feedback 👍

Stream to stream is already supported (see tarGzExpander) only it does not handle filters based on filename, and can't stream to a JSON decoder.

However, based on your initial idea I've attempted to add '--exclude' and '--include' support for tar extraction and this is what I've come up with:

void TarUnpacker::setTarExcludePattern( tarExcludePattern cb );
void TarUnpacker::setTarIncludePattern( tarIncludePattern cb ):

provided you have your own custom filtering function:

bool myCustomExcludePatternMatcher( const char* filename )
{
  if( ! String( filename ).endsWith("my_keyword") ) return true; // will be excluded
  return false; // will be included
}

you can eventually ignore some files during unpacking

TARGZUnpacker->setTarExcludePattern( myCustomExcludePatternMatcher );

triggering a callback when a file has been updacked was already possible (e.g. to read json contents), see the example folder (test_tool.h) with a myTarMessageCallback implementation that reads file contents just after they've been untarred:

TARGZUnpacker->setTarMessageCallback( myCustomTriggerOnFileClose );

I've pushed the changes from this comment (untested though) on a specific branch if you want to play with that.

Now keep in mind you don't always control the order in which the files are added/extracted in the tar archive.
If the json contents refers to another file in the same tar archive, this file may or may not already be unpacked, depending on many factors (path, name, modification date, arbitrary order).

@wimmatthijs
Copy link
Author

Hi,

i'm sorry, i'm talking not about multiple files but about the gzipped response of a web-server.
From most servers you can get a gzipped response,
which is particularly convenient if an API is only programmed to give rather big JSON-responses.

So my aim is to receive the gzipped-info, unpack it and immediately and after that scan the contents of the unpacked for what i need.

a byte for byte stream would be ideal, but i'm not sure how gzip unzip works, does it reproduce the original byte by byte or chunks?

Wim

@tobozo
Copy link
Owner

tobozo commented Apr 14, 2021

oh right

byte-to-byte stream requires destination data to be seekable (dictionary is replaced by output data) so the destination can't be memory or a json parser, it must be a filesystem, although it only uses a few bytes of ram, it's very slow and will generate a lot of i/o so not recommended for using with http unless the app is very resilient and does not care about doing multiple attempts before it is successful

on the other hand using a dictionary can work but can fragment heap when used repeatedly

gzStreamUpdater could be a goot basis to implement that, using gzWriteCallback = &gzStreamWriteCallback; instead of the Updater methods.

@wimmatthijs
Copy link
Author

ahah, this is a very good suggestion, i will have a look into that!

@tobozo
Copy link
Owner

tobozo commented Apr 24, 2021

hey @wimmatthijs, some after thoughts on this:

  1. if the server sends a Content-Length header with the size of the gz file
  2. if the ESP32 has at least 32kb heap free when decompression starts

.. then it becomes theoritically possible to do stream (HTTP) to stream (JSON) using ArduinoStreamReader deserialization interface from ArduinoJSON

I do not have a use case in mind for that though, could you point me out to an example sketch I could use as a basis to start testing this ?

@tobozo tobozo added the enhancement New feature or request label Apr 24, 2021
@tobozo tobozo reopened this Apr 24, 2021
@wimmatthijs
Copy link
Author

hey @wimmatthijs, some after thoughts on this:

  1. if the server sends a Content-Length header with the size of the gz file
  2. if the ESP32 has at least 32kb heap free when decompression starts

.. then it becomes theoritically possible to do stream (HTTP) to stream (JSON) using ArduinoStreamReader deserialization interface from ArduinoJSON

I do not have a use case in mind for that though, could you point me out to an example sketch I could use as a basis to start testing this ?

Hey, cool, i have a very specific application in mind of course, could we maybe set up a short meeting to discuss?
I'm not sure i'm allowed to discuss about the project on Github....

my email is [email protected]
just drop me a line, thaks so much!

Wim

@tobozo
Copy link
Owner

tobozo commented Apr 25, 2021

sure, I've dropped you a message on hangouts/gmeet, otherwise I'm on gitter

https://gitter.im/tobozo

@javipelopi
Copy link

javipelopi commented Apr 29, 2021

Hello again @tobozo!

In my application, I have a firmware.bin.gz splitted into multiple parts (gz001, gz002, etc) thatI get from the internet. Ideally, I would like to get them one by one and feed them to gzStreamUpdater without saving them to the filesystem.

Would that be possible? Could you give me a little direction on how should this be accomplished?

Thanks in advance!

@tobozo
Copy link
Owner

tobozo commented Apr 29, 2021

hey @javipelopi this is not possible with the current library.

Please make sure you create a new issue if you have another question, this thread is not about multipart.

@javipelopi
Copy link

@tobozo hi sure! I asked it here as I wanted to know if it was possible to decompress and consume the data on the fly for each part, or if there were preconditions that should be met for it (for example that each part would be divisible by a certain amount)

Anyway I will try to dig a little bit on my own!

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants