Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: rework error logging/metric reporting; fix BSO batch updates for spanner #824

Merged
merged 16 commits into from
Oct 14, 2020

Conversation

jrconlin
Copy link
Member

@jrconlin jrconlin commented Sep 17, 2020

Closes #618

Description

We're reporting too many errors to sentry, (HAWK errors, DB conflicts, etc). Many of these can be switched to metrics.
This PR cleans up a bit of the error/metric reporting mechanism, pushing the self reporting decision to the original ErrorKind enums where possible.

This PR may be a monument to sunk cost and misunderstanding.

It addresses some shortcomings with how errors were being handled and reported (because of an initial misunderstanding about an issue). To that end,it moves some of the logic regarding whether an error is displayed in sentry or a metric stored into the actual Error mods.

To rectify the original misunderstanding, this PR also includes logic to perform updates to pending BSO batch updates, mostly for spanner, with some initial work done for MySQL.

May it be used by future generations to scare their kids into eating their vegetables.

Testing

For the metric/sentry reporting:

Testers can monitor stage and see a decrease in errors similar to https://sentry.prod.mozaws.net/operations/syncstorage-prod/issues/9481936/?query=is%3Aunresolved

For the spanner batch update merge function:

Internal testing included.

Issue(s)

Closes #174, #619, #618

@jrconlin
Copy link
Member Author

jrconlin commented Sep 21, 2020

Going to block this on #807, since that one introduces a few useful patterns for doing batch item checks.

@jrconlin
Copy link
Member Author

See also 56quarters/cadence#103 for testing

@jrconlin jrconlin marked this pull request as ready for review September 28, 2020 23:40
@jrconlin jrconlin requested a review from a team September 28, 2020 23:40
Copy link
Member

@pjenvey pjenvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(FYI I'm still look at spanner's commit)

src/db/error.rs Outdated Show resolved Hide resolved
src/db/mysql/batch.rs Show resolved Hide resolved
src/db/spanner/batch.rs Outdated Show resolved Hide resolved
src/db/spanner/batch.rs Show resolved Hide resolved
src/db/spanner/batch.rs Outdated Show resolved Hide resolved
src/db/spanner/batch.rs Outdated Show resolved Hide resolved
src/db/spanner/batch.rs Outdated Show resolved Hide resolved
Copy link
Member

@pjenvey pjenvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: I don't think this closes #174

src/db/spanner/batch.rs Outdated Show resolved Hide resolved
src/db/error.rs Outdated Show resolved Hide resolved
src/db/error.rs Outdated Show resolved Hide resolved
src/db/spanner/batch.rs Outdated Show resolved Hide resolved
src/db/spanner/batch.rs Outdated Show resolved Hide resolved
@tublitzed
Copy link
Contributor

@jrconlin can you update the PR description here to reflect what is actually changing and how reviewers should test? This looks like quite a bit more than metric/reporting changes.

@jrconlin jrconlin marked this pull request as draft October 7, 2020 17:23
@jrconlin jrconlin marked this pull request as ready for review October 7, 2020 21:30
@jrconlin jrconlin requested review from pjenvey and a team October 9, 2020 16:00
src/db/tests/batch.rs Outdated Show resolved Hide resolved
src/db/tests/batch.rs Outdated Show resolved Hide resolved
if !sort_index.is_empty() {
fields.push("sortindex");
} else {
// This is not written, but the sql builder requires a numeric string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What requires this value here (I don't understand about the sql builder requiring it). I suspect it's params and maybe param_types? We could build those dynamically (and or pop from param_types since it's already set up front)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, before the SQL is actually processed, the query builder resolves the types of the arguments. The problem is that the NoneType resolves to "" which fails to resolve to a numeric. That's why I pass in the "0", which does.

If I was SUPER clever, I would probably just exclude the values from the params using the same crafting technique I'm doing to build the sql, but that feels like making things really complicated rather than just do this, particluarly with the way that params have to be added.

MichaIng and others added 7 commits October 9, 2020 15:03
On Debian, there is not libmysqlclient available since Stretch. Debian has fully migrated to MariaDB as replacement. There is a libmariadb-dev package, which ships libmariadbclient.so, but this is currently not accepted by the migrations_macros build. A simple solution is the available compat package: https://packages.debian.org/libmariadb-dev-compat

Signed-off-by: MichaIng <[email protected]>
0.20.x fails to emit any events (maybe a CurlHttpTransport issue?)

Issue #846
# This is the 1st commit message:

bug: Address AlreadyExist errors

* rework error logging/metric reporting.
* Added `is_reportable()` and `metric_label()` to `ErrorKind`s
* removed unused `metric_label` field from ApiError

Closes #174, #619, #618, #827

# The commit message #2 will be skipped:

# f WIP: nightly
#
# * TODO: why is insert not seeing int64 type?
# * TODO: add mysql fixup.

# The commit message #3 will be skipped:

# f fix test

# The commit message #4 will be skipped:

# f fix tests
* rework error logging/metric reporting.
* Added `is_reportable()` and `metric_label()` to `ErrorKind`s
* removed unused `metric_label` field from ApiError

Closes #174, #619, #618, #827
@jrconlin jrconlin requested a review from pjenvey October 9, 2020 23:30
@@ -1,4 +1,4 @@
FROM rust:1.45-buster as builder
FROM rust:1.46-buster as builder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an already huge PR. Do we need to add dep upgrades into it, or can that be done separately?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protobuf update was required due to a local build issue. This was addressed in #853. The rust image update brings our image closer to the currently stable rust branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that the image update gets us closer to stable rust, but what I don't understand is why it's required as part of this work?

The protobuf update seems...completely unlreated? Unless I'm missing something obvious here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, I can't develop, run or test on my machine unless I update protobuf. This is because protobuf is often a system level requirement and is not directly tied to a given project release. A single update of protobuf from anything will cause all versions of protobuf to update. It's possible to spin that out as a dependency, however it's often just more direct and quicker to deploy to roll it into an existing PR.

As for the rust update, this makes sure that our current testing environment again matches the local development environment. This ensures that things like clippy, fmt and other tools operate the same and reduces the need for using CI as a live debugging environment to address issues that may only present themselves on one version.

@tublitzed
Copy link
Contributor

@jrconlin could you also update the title of this PR so it's easier to track what's happening once it merges? I think this is no longer just feat: rework error logging/metric reporting.? I know it's a bit of a nit, but stuff like this really helps a lot when grokking code changes after they merge.

@jrconlin jrconlin changed the title feat: rework error logging/metric reporting. feat: rework error logging/metric reporting; fix BSO batch updates for spanner Oct 13, 2020
dynamically build up params for each do_append update

Co-authored-by: Philip Jenvey <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[spanner] Handle AlreadyExists in batch_bsos errors Improve and standardize trace statements and logging
5 participants