-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http scraping with an IntervalStream
piles up requests if the server takes > interval_secs
to respond
#14087
Labels
source: http_client
Anything `http_client` source related
Comments
8 tasks
Here is a python script graciously provided by @hhromic , that can be used to test out the The import asyncio
import random
from aiohttp import web
async def handler(request):
data = request.config_dict["data"]
data["req_num"] += 1
req_num = data["req_num"] # store locally as it can change during awaits
delay = 10 if req_num == 1 else 1
await asyncio.sleep(delay)
return web.Response(text=f"req {req_num} = {delay}s\n")
app = web.Application()
app.add_routes([web.get("/", handler)]) # can be web.post(...) etc
app["data"] = {"req_num": 0}
web.run_app(app) data_dir = "/var/lib/vector/"
[sources.source0]
endpoint = "http://localhost:8080"
scrape_interval_secs = 1
type = "http_scrape"
[sources.source0.decoding]
codec = "bytes"
[sources.source0.framing]
method = "bytes"
[sources.source0.headers]
[sources.source0.query]
[sinks.sink0]
inputs = ["source0"]
target = "stdout"
type = "console"
[sinks.sink0.encoding]
codec = "json"
[sinks.sink0.healthcheck]
enabled = true
[sinks.sink0.buffer]
type = "memory"
max_events = 500
when_full = "block"
|
neuronull
changed the title
http scraping with on
http scraping with an Aug 25, 2022
IntervalStream
piles up requests if the server takes > interval
to respondIntervalStream
piles up requests if the server takes > interval
to respond
neuronull
changed the title
http scraping with an
http scraping with an Aug 25, 2022
IntervalStream
piles up requests if the server takes > interval
to respondIntervalStream
piles up requests if the server takes > interval_secs
to respond
This was referenced Jul 14, 2023
github-merge-queue bot
pushed a commit
that referenced
this issue
Jul 24, 2023
…timeouts (#18021) <!-- **Your PR title must conform to the conventional commit spec!** <type>(<scope>)!: <description> * `type` = chore, enhancement, feat, fix, docs * `!` = OPTIONAL: signals a breaking change * `scope` = Optional when `type` is "chore" or "docs", available scopes https://github.com/vectordotdev/vector/blob/master/.github/semantic.yml#L20 * `description` = short description of the change Examples: * enhancement(file source): Add `sort` option to sort discovered files * feat(new source): Initial `statsd` source * fix(file source): Fix a bug discovering new files * chore(external docs): Clarify `batch_size` option --> fixes #14087 fixes #14132 fixes #17659 - [x] make target timeout configurable this builds on what @wjordan did in #17660 ### what's changed - prometheus scrapes happen concurrently - requests to targets can timeout - the timeout can be configured (user facing change) - small change in how the http was instantiated --------- Co-authored-by: Doug Smith <[email protected]> Co-authored-by: Stephen Wakely <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm not familiar with the current approach to scraping. Could this cause multiple requests to be outstanding simultaneously if a slow server causes a timeout longer than
interval_secs
?This, incidentally, is one reason to avoid mutable state in per-request context data.
Originally posted by @bruceg in #13793 (comment)
See conversation thread in linked discussion for more context, specifically #13793 (comment).
The text was updated successfully, but these errors were encountered: