-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1685745 - Upload a ping even if we can't save it to disk #1576
Conversation
glean-core/src/lib.rs
Outdated
@@ -646,6 +646,9 @@ impl Glean { | |||
) { | |||
log::warn!("IO error while writing ping to file: {}", e); | |||
self.core_metrics.io_errors.add(self, 1); | |||
log::warn!("Gonna try sending what we have in memory."); | |||
let content = ::serde_json::to_string(&ping.content)?; | |||
self.upload_manager.enqueue_ping(self, ping.doc_id, ping.url_path, &content, Some(ping.headers)); | |||
return Err(e.into()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we still return an error here even though we attempt ping upload? We could just return a bool
stating whether or not anything was enqueued and log the IO errors inside the function only. At least, it's worth checking how other places of the code deal with this error. For example, here and here the log messages would need to be updated or removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also worth adding a note about this behaviour in the doc for this function. Currently we have
Collects and submits a ping for eventual uploading.
But the new behaviour is not exactly that, but more like
Attempts collection and submits a ping for eventual uploading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collection and submission is still guaranteed as before, no? We just try a little harder to send it.
As for returning the error, I wanted to maintain existing semantics as much as possible. I'll reach out and update the log messages, as they weren't quite right before either. It wasn't the sending that failed (we have no idea whether a submitted ping is successfully sent), it was the persisting to disk that failed. Maybe I can clarify this in the doccomment as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collection and submission is still guaranteed as before, no?
Hm, you are right. When I wrote my other comment I had colletion in mind as get from storage > assemble > store again
, but it's actually get from storage > assemble
.
As for returning the error, I wanted to maintain existing semantics as much as possible.
I am alright with this, but it is weird for me that this function returns an error when there is nothing actionable anyone can do if it errors. Might be content for a follow up, though. No pressing need to change this in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By returning an error the LB will not launch an upload worker.
If this returns Ok(false)
or an error, the FFI part will turn it into false (0
actually), and the LB part will not enqueue a worker.
I give you that Result<bool>
is really not a descriptive type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the more reason not to do that anymore since now we want to enqueue a worker even if there was an IO error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we could return Ok(true)
now. It's about as truthful a return value as Err(e)
is.
It would mean that we never return Err
from this method, though.
Stepping back, why do we care about any return value here?
LBs want to know if a ping's enqueued so it can trigger an upload... but even if no new ping is enqueued, maybe there's an pending ping we want to give another try? Maybe LBs should unconditionally trigger uploads after submitting (unless they have a good reason not to).
Does anything other than LBs' decisions about whether to trigger upload checks care about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was used as a small performance optimisation only: don't schedule work if we sorta know there's no work.
To my knowledge nothing else depends on that behavior.
Indeed pending pings could be around, and so at the moment it requires a successful ping storage to even try pending pings again.
What if, in order to reduce outfall from changed behavior, we return Ok(true)
here to keep it "the same", then get rid of it in a followup PR? (Easier separation of implementation concerns, easier to review and rollback if necessary)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okidoki. Filed bug 1704606
Well, if Jan-Erik wants eyes on this change, maybe it's a good thing I forgot (!!) to include this in the release. |
Probably totally intentionally! But yeah, I'd be happy to give it a quick look |
glean-core/src/ping/mod.rs
Outdated
use crate::util::{get_iso_time_string, local_now_with_offset}; | ||
use crate::{ | ||
Glean, Result, DELETION_REQUEST_PINGS_DIRECTORY, INTERNAL_STORAGE, PENDING_PINGS_DIRECTORY, | ||
}; | ||
|
||
/// Holds everything you need to store or send a ping. | ||
pub struct Ping<'a> { | ||
/// The doc id. Unique. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// The doc id. Unique. | |
/// The unique document id. |
glean-core/src/lib.rs
Outdated
log::warn!("IO error while writing ping to file: {}", e); | ||
self.core_metrics.io_errors.add(self, 1); | ||
log::warn!("Gonna try sending what we have in memory."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combine this into the above log
message.
39162dc
to
df9d065
Compare
glean-core/src/lib.rs
Outdated
@@ -613,6 +613,7 @@ impl Glean { | |||
/// # Errors | |||
/// | |||
/// If collecting or writing the ping to disk failed. | |||
/// We may still attempt upload if collection succeeded but writing to disk failed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is now confusing here in the docs, because it's under the "errors" section.
@@ -893,7 +893,8 @@ fn records_io_errors() { | |||
|
|||
// Writing the ping file should fail. | |||
let submitted = glean.internal_pings.metrics.submit(&glean, None); | |||
assert!(submitted.is_err()); | |||
// But the return value is still Ok because we enqueue the ping anyway. | |||
assert!(submitted.is_ok()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we extend this test to ensure that the uploader will see both pings (this one that's not written to disk and the one that is written to disk)?
Fine if done in a followup
No description provided.