Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1685745 - Upload a ping even if we can't save it to disk #1576

Merged
merged 7 commits into from
Apr 13, 2021

Conversation

chutten
Copy link
Contributor

@chutten chutten commented Apr 7, 2021

No description provided.

@chutten chutten requested a review from brizental April 7, 2021 19:35
@auto-assign auto-assign bot requested a review from mdboom April 7, 2021 19:35
glean-core/tests/ping_maker.rs Outdated Show resolved Hide resolved
glean-core/src/ping/mod.rs Outdated Show resolved Hide resolved
@@ -646,6 +646,9 @@ impl Glean {
) {
log::warn!("IO error while writing ping to file: {}", e);
self.core_metrics.io_errors.add(self, 1);
log::warn!("Gonna try sending what we have in memory.");
let content = ::serde_json::to_string(&ping.content)?;
self.upload_manager.enqueue_ping(self, ping.doc_id, ping.url_path, &content, Some(ping.headers));
return Err(e.into());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we still return an error here even though we attempt ping upload? We could just return a bool stating whether or not anything was enqueued and log the IO errors inside the function only. At least, it's worth checking how other places of the code deal with this error. For example, here and here the log messages would need to be updated or removed.

Copy link
Contributor

@brizental brizental Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also worth adding a note about this behaviour in the doc for this function. Currently we have

Collects and submits a ping for eventual uploading.

But the new behaviour is not exactly that, but more like

Attempts collection and submits a ping for eventual uploading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collection and submission is still guaranteed as before, no? We just try a little harder to send it.

As for returning the error, I wanted to maintain existing semantics as much as possible. I'll reach out and update the log messages, as they weren't quite right before either. It wasn't the sending that failed (we have no idea whether a submitted ping is successfully sent), it was the persisting to disk that failed. Maybe I can clarify this in the doccomment as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collection and submission is still guaranteed as before, no?

Hm, you are right. When I wrote my other comment I had colletion in mind as get from storage > assemble > store again, but it's actually get from storage > assemble.

As for returning the error, I wanted to maintain existing semantics as much as possible.

I am alright with this, but it is weird for me that this function returns an error when there is nothing actionable anyone can do if it errors. Might be content for a follow up, though. No pressing need to change this in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By returning an error the LB will not launch an upload worker.
If this returns Ok(false) or an error, the FFI part will turn it into false (0 actually), and the LB part will not enqueue a worker.

I give you that Result<bool> is really not a descriptive type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the more reason not to do that anymore since now we want to enqueue a worker even if there was an IO error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could return Ok(true) now. It's about as truthful a return value as Err(e) is.

It would mean that we never return Err from this method, though.

Stepping back, why do we care about any return value here?

LBs want to know if a ping's enqueued so it can trigger an upload... but even if no new ping is enqueued, maybe there's an pending ping we want to give another try? Maybe LBs should unconditionally trigger uploads after submitting (unless they have a good reason not to).

Does anything other than LBs' decisions about whether to trigger upload checks care about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was used as a small performance optimisation only: don't schedule work if we sorta know there's no work.
To my knowledge nothing else depends on that behavior.

Indeed pending pings could be around, and so at the moment it requires a successful ping storage to even try pending pings again.

What if, in order to reduce outfall from changed behavior, we return Ok(true) here to keep it "the same", then get rid of it in a followup PR? (Easier separation of implementation concerns, easier to review and rollback if necessary)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okidoki. Filed bug 1704606

@chutten chutten requested a review from brizental April 8, 2021 14:54
@badboy badboy self-requested a review April 12, 2021 08:54
@chutten
Copy link
Contributor Author

chutten commented Apr 12, 2021

Well, if Jan-Erik wants eyes on this change, maybe it's a good thing I forgot (!!) to include this in the release.

@badboy
Copy link
Member

badboy commented Apr 12, 2021

Well, if Jan-Erik wants eyes on this change, maybe it's a good thing I forgot (!!) to include this in the release.

Probably totally intentionally! But yeah, I'd be happy to give it a quick look

use crate::util::{get_iso_time_string, local_now_with_offset};
use crate::{
Glean, Result, DELETION_REQUEST_PINGS_DIRECTORY, INTERNAL_STORAGE, PENDING_PINGS_DIRECTORY,
};

/// Holds everything you need to store or send a ping.
pub struct Ping<'a> {
/// The doc id. Unique.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// The doc id. Unique.
/// The unique document id.

log::warn!("IO error while writing ping to file: {}", e);
self.core_metrics.io_errors.add(self, 1);
log::warn!("Gonna try sending what we have in memory.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combine this into the above log message.

@chutten chutten force-pushed the bug1685745-ioErrorPingSubmit branch from 39162dc to df9d065 Compare April 12, 2021 19:12
@@ -613,6 +613,7 @@ impl Glean {
/// # Errors
///
/// If collecting or writing the ping to disk failed.
/// We may still attempt upload if collection succeeded but writing to disk failed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is now confusing here in the docs, because it's under the "errors" section.

@@ -893,7 +893,8 @@ fn records_io_errors() {

// Writing the ping file should fail.
let submitted = glean.internal_pings.metrics.submit(&glean, None);
assert!(submitted.is_err());
// But the return value is still Ok because we enqueue the ping anyway.
assert!(submitted.is_ok());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we extend this test to ensure that the uploader will see both pings (this one that's not written to disk and the one that is written to disk)?

Fine if done in a followup

@chutten chutten merged commit f3eef82 into mozilla:main Apr 13, 2021
@chutten chutten deleted the bug1685745-ioErrorPingSubmit branch April 13, 2021 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants