-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1679949 - Hacky bug fix: Lower sleep time to one second #1349
Conversation
900261d
to
ec26275
Compare
This isn't by design (or at least isn't by design unless some other bug is causing the design to fail). The thread dispatcher joins on close, with a timeout of 30s, but the subprocess dispatcher (where the ping uploader runs) has no timeout. If All that is to say, this quick fix is fine for the reasons outlined above -- but perhaps there is a bug that is causing the work in the subprocess to block the thread... |
Ok, then I am confused. I ran this minimal script:
and it hangs on 33.4.0 because the uploader got the wait, sleeps for 60s and 30s later I see the |
Here's the bug: we want to only run one ping uploading subprocess at a time, so we wait for one to finish if one is already running. If we submit two pings in quick succession, this causes the first ping uploading process to block the dispatcher thread. I think there's probably a better, non-blocking way to wait on the subprocess to finish. |
@badboy given the above, should this be closed? |
Nope, we still delay the initial sending by up to 60s if it's too fast. It continues to run in the background but IMO it should be done quickly and not keep that process around sleeping. |
Partial revert of 2261845 The default-increased sleep time causes issues for short-lived applications, such as tests (burnham) or command-line tools (if they are _really_ quick): Glean initializes asynchronously, so by the time the first ping is submitted it is not yet done with initialization and scanning the pending pings directory and therefore asking the uploader to sleep for a moment and return when the scan finished. However right now it has no mechanism to communicate a short wait, so consumers default to the throttle backoff time (which is 60s). So if a fast client: 1. Calls `Glean.initialize` 2. Records some data and submits a ping 3. Stops the uploader will start, get told to wait, sleep for 60s. Meanwhile Glean-py's shutdown routine asks the upload process to come to an end within 30s or kills it, which is subsequently does (a sleeping process will not notice). By dropping back to a 1s sleep we avoid that situation for now. A proper fix will allow glean-core to communicate the sleep time correctly. The downside is that for long-lived Python processes, once we get throttled it will quickly ask glean-core for tasks again 3 times in a row, then gets told it's done. Pending pings will then not be picked up until a new ping is submitted again (or the process is restarted completely). That's an acceptable risk given that we don't really have long-running clients right now.
ec26275
to
ab5efb6
Compare
Partial revert of 2261845
The default-increased sleep time causes issues for short-lived
applications, such as tests (burnham) or command-line tools (if they are
really quick):
Glean initializes asynchronously, so by the time the first ping is
submitted it is not yet done with initialization and scanning the
pending pings directory and therefore asking the uploader to sleep for a
moment and return when the scan finished.
However right now it has no mechanism to communicate a short wait, so
consumers default to the throttle backoff time (which is 60s).
So if a fast client:
Glean.initialize
the uploader will start, get told to wait, sleep for 60s.
Meanwhile Glean-py's shutdown routine asks the upload process to come to
an end within 30s or kills it, which is subsequently does (a sleeping
process will not notice).
By dropping back to a 1s sleep we avoid that situation for now.
A proper fix will allow glean-core to communicate the sleep time correctly.
The downside is that for long-lived Python processes, once we get
throttled it will quickly ask glean-core for tasks again 3 times in a
row, then gets told it's done.
Pending pings will then not be picked up until a new ping is submitted
again (or the process is restarted completely).
That's an acceptable risk given that we don't really have long-running
clients right now.