Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http_scrape source should support compression. #13888

Open
neuronull opened this issue Aug 8, 2022 · 2 comments
Open

http_scrape source should support compression. #13888

neuronull opened this issue Aug 8, 2022 · 2 comments
Labels
domain: sources Anything related to the Vector's sources type: feature A value-adding code addition that introduce new functionality.

Comments

@neuronull
Copy link
Contributor

neuronull commented Aug 8, 2022

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

The http_scrape scrape source implemented in #13793 , does not currently support compression.

@neuronull neuronull added domain: sources Anything related to the Vector's sources type: feature A value-adding code addition that introduce new functionality. labels Aug 8, 2022
@zamazan4ik
Copy link
Contributor

zamazan4ik commented May 10, 2023

@neuronull as far as I understand, Vector right now uses hyper under the hood as an HTTP client. Hyper does not have the ability to transparently decompress the payload based on Content-Encoding header, but reqwest does.

I see here the following ways to resolve the issue:

  • Use reqwest instead of hyper to have a transparent decoding functionality. Could be difficult to migrate from hyper , though.
  • Implement Content-Encoding-based decompression on our own. Could be quite tricky since the server and the client could support multiple compression algorithms and they could negotiate somehow the algorithm (honestly, idk much about this stuff).
  • Implement compression option as usually we do for other sources/sinks and specify the only accepted algorithm on Vector side (or a couple of them, which we could try one by one - but could be overengineering, though). Not so HTTP-native way, you know

Do you have any other ideas?

@jszwedko
Copy link
Member

I think we should do:

  • Implement Content-Encoding-based decompression on our own. Could be quite tricky since the server and the client could support multiple compression algorithms and they could negotiate somehow the algorithm (honestly, idk much about this stuff).

The negotiation, I think, is pretty simple. The client sets Accept-Encodings and the server picks one. Vector would just set Accept-Encodings to the set of encodings it can handle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: sources Anything related to the Vector's sources type: feature A value-adding code addition that introduce new functionality.
Projects
None yet
Development

No branches or pull requests

3 participants