Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

status-indicator: Display external service syncing errors #5319

Merged
merged 18 commits into from
Aug 23, 2019

Conversation

mrnugget
Copy link
Contributor

@mrnugget mrnugget commented Aug 21, 2019

This is the revised version of the (never merged) PR #4780 and the first attempt at fixing #5146.

TL;DR: This is step 1/n. It shows repo-updater errors in the UI. They look raw, but I think that's fine, because we finally show errors and I'm now also happy with the underlying implementation and think that we can iterate on the details (see later sections)

What this PR does it to make the errors visible that repo-updater encounters when syncing repositories from external services. These errors can be produced when trying to construct an external service from the existing configuration or when talking to the external service to list the available repositories for sync.

These errors then get shown in the UI, like this:

Screen Shot 2019-08-21 at 17 05 29

Implementation

The implementation in this PR is based on #4780:

  • It adds a new type of status message, a SyncErrorStatusMessage
  • This status message is returned by repo-updater in its /status-messages endpoint when a sync error has been produced in the last sync run
  • Each sync run sets or resets the current sync error, meaning: fixing an external service configuration results in the message disappearing with the next successful sync (triggered on save). But that also means that errors that pop up long after an external service has been created are displayed (network errors, invalid tokens, ...)

In addition to that it also contains the suggestions made by @felixfbecker and @tsenart:

  • The StatusMessage type in the GraphQL schema has been turned into a union type. I like this approach a lot more, because it not only gives us type-safety in TypeScript, but (at the cost of verbosity) also gives us better type-safety in graphqlbackend and repo-updater.
  • There are no union types in Go, which makes the definition of StatusMessage in repoupdater/protocol a bit unwiedly. But that's one of the possible I discussed with @tsenart and I like it a lot more than the untyped list of key-value pairs.
  • The sync errors are now styled with alert-danger which makes them pop out a lot more as errors

Open issues, drawbacks & thoughts for enhancements

  1. The error messages are still raw. But now that I'm happier with the underlying implementation I agree with @nicksnyder's assessment that showing anything in case there's an error is better than showing nothing. And It's actually not that easy to untangle layers and layers of errors produced by (possibly multiple) external services and turn them into something meaningful and actionable. But I will use the rest of this iteration to hack on this problem.
    Idea: I prototyped to define a set of errors that is common across external services (bad credentials, connection error, rate limit, ...) and then wrap the errors produced by each external service into one of these custom errors. In the repo-updater endpoint we could then detect these errors and replace their messages with something meaningful. The drawback of that is that it's hard to define this common set so it makes sense for all external services and wrapping each the errors produced by each one requires knowledge of each external service's idosyncracies. That's a lot of work, but we could approach it step by step and wrap/beautify these errors as they arise.
  2. This does not (yet?) show the warnings that repo-updater produces. Example of such a warning is "repository not found" which is produced when we try to fetch the configured repositories but can't find one of them. This warning is only written to the log and never returned as an error, because we want to
  3. This does not (yet?) show errors/warnings that gitserver runs into when cloning/fetching/cleaning repositories. Until now I haven't looked into making gitserver errors visible, but that's certainly a large part of what would make fixing Make external service syncing errors visible to site-admins #5146 valuable

Especially 2 and 3 make me question whether the current "architecture" will hold up: if we only care about errors, it's easy to remember the last error and reset it when everything's fine again. But what do we do when we want to display ephemeral data, such as warnings? Where and how do we collect it? (Theoretically we could change our function's return signature from <val>, error to <val>, warningstring, error, but that's ... ugly). I keep thinking that we need to introduce something (an ErrorReporter?) that other functions/methods could call to report an issue. That would be far easier than wrapping, passing and unwrapping errors along the call chain in the syncer, with the possible added cost of introducing a singleton. And, still: what do we do with warnings?

Happy about any input/feedback/ideas!

Test Plan

Test plan: manual testing, go test, cd web && yarn test --runInBand StatusMessagesNavItem.test.tsx

@mrnugget mrnugget force-pushed the core/ext-svc-errors-2 branch 4 times, most recently from 9f91fe2 to 48b7daa Compare August 21, 2019 15:09
@codecov
Copy link

codecov bot commented Aug 21, 2019

Codecov Report

Merging #5319 into master will increase coverage by 0.09%.
The diff coverage is 86.2%.

@@            Coverage Diff             @@
##           master    #5319      +/-   ##
==========================================
+ Coverage   45.78%   45.87%   +0.09%     
==========================================
  Files         736      736              
  Lines       45416    45523     +107     
  Branches     2621     2625       +4     
==========================================
+ Hits        20793    20885      +92     
- Misses      22612    22624      +12     
- Partials     2011     2014       +3
Impacted Files Coverage Δ
cmd/repo-updater/repos/syncer.go 79.32% <100%> (+1.49%) ⬆️
cmd/repo-updater/repoupdater/server.go 60.91% <100%> (+1.64%) ⬆️
cmd/frontend/graphqlbackend/status_messages.go 94.44% <100%> (+7.77%) ⬆️
cmd/repo-updater/repos/sources.go 77.19% <59.45%> (-8.53%) ⬇️
cmd/repo-updater/repos/testing.go 80.74% <75%> (-0.32%) ⬇️
web/src/nav/StatusMessagesNavItem.tsx 82.69% <91.66%> (+8.49%) ⬆️
cmd/frontend/graphqlbackend/external_service.go 5.88% <0%> (+5.88%) ⬆️

@codecov
Copy link

codecov bot commented Aug 21, 2019

Codecov Report

Merging #5319 into master will increase coverage by 0.06%.
The diff coverage is 76.85%.

@@            Coverage Diff             @@
##           master    #5319      +/-   ##
==========================================
+ Coverage   46.07%   46.13%   +0.06%     
==========================================
  Files         735      735              
  Lines       45176    45263      +87     
  Branches     2621     2626       +5     
==========================================
+ Hits        20813    20882      +69     
- Misses      22349    22363      +14     
- Partials     2014     2018       +4
Impacted Files Coverage Δ
cmd/repo-updater/repos/syncer.go 79.62% <100%> (+1.78%) ⬆️
cmd/repo-updater/repos/sources.go 77.19% <59.45%> (-8.53%) ⬇️
cmd/repo-updater/repoupdater/server.go 58.24% <61.53%> (-1.03%) ⬇️
cmd/repo-updater/repos/testing.go 80.74% <75%> (-0.32%) ⬇️
cmd/frontend/graphqlbackend/status_messages.go 80% <82.35%> (-6.67%) ⬇️
web/src/nav/StatusMessagesNavItem.tsx 82.97% <89.28%> (+8.78%) ⬆️
...d/frontend/internal/authz/bitbucketserver/store.go 72.51% <0%> (+0.94%) ⬆️
cmd/frontend/graphqlbackend/external_service.go 17.64% <0%> (+17.64%) ⬆️

@mrnugget mrnugget marked this pull request as ready for review August 21, 2019 15:24
@beyang beyang removed their request for review August 21, 2019 16:16
Copy link
Contributor

@lguychard lguychard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks a bit odd to me that in the attached screenshot, the 'Repositories Cloning' message is still shown. I feel like the display of errors & cloning messages should be mutually exclusive. Thoughts?

web/src/nav/StatusMessagesNavItem.tsx Outdated Show resolved Hide resolved
web/src/nav/StatusMessagesNavItem.tsx Outdated Show resolved Hide resolved
web/src/nav/StatusMessagesNavItem.tsx Outdated Show resolved Hide resolved
web/src/nav/StatusMessagesNavItem.tsx Outdated Show resolved Hide resolved
@nicksnyder nicksnyder removed their request for review August 21, 2019 18:30
@mrnugget
Copy link
Contributor Author

Thanks for the great suggestions, @lguychard! Wil incorporate them today!

It looks a bit odd to me that in the attached screenshot, the 'Repositories Cloning' message is still shown. I feel like the display of errors & cloning messages should be mutually exclusive. Thoughts?

I don't think they should be mutually exclusive. There's multiple cases where it's useful to have this:

  • You've added an external service for GitHub that clones ~10k repos. Takes some time. Then you add one for Gitlab and that produces a warning because, e.g. the connection has problems. You still want to see that GitHub makes progress while you deal with the other issue.
  • You've added two external services in the past. Someone added a new org with 1000 repos to one code host. Sourcegraph starts cloning. At the same time someone deleted the token you use to authenticate with the other external service. It still makes sense to show that one external service is doing fine and the other is not.

Point being: external services can be totally independent of each other and I think it makes sense to show the cloning message even if one external service produces errors. It's also an added signal: does the number of to-be-cloned repos change while the error is displayed? Or is it not? Could help debugging

@lguychard
Copy link
Contributor

You still want to see that GitHub makes progress while you deal with the other issue.

It still makes sense to show that one external service is doing fine and the other is not.

I understand this need, but this isn't conveyed clearly (to me) in the current implementation, mostly because the cloning message, as shown, is not specific to an external service, while the error messages are. This makes it unclear to me whether the errors are blocking the enqueued repositories from being cloned, or whether cloning of some repositories is still happening in spite of the error.

@mrnugget
Copy link
Contributor Author

I understand this need, but this isn't conveyed clearly (to me) in the current implementation, mostly because the cloning message, as shown, is not specific to an external service, while the error messages are.

I see. Good point. Solving this is not quite that easy, I feel, because of the second part:

This makes it unclear to me whether the errors are blocking the enqueued repositories from being cloned, or whether cloning of some repositories is still happening in spite of the error.

The errors could block the cloning, but not necessarily.

Off the top of my head, there's two potential solutions to this:

  • We need to be more specific in the cloning progress (ideally we could show the repos that are being cloned right now, or have a progress bar, or show which external service is involved, etc.). Requires a lot of added state/functionality to gitserver/repo-updater.
  • We show a status-icon per external service. If everything's fine and an external service has been synced: green checkmark. if there's an error: red X. That way you could see at a glance which external service is syncing and which one's not. @christinaforney had this idea a while back. That could be a next step on the status indicator.

Copy link
Member

@keegancsmith keegancsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go bits LGTM. Comments inline

cmd/frontend/graphqlbackend/schema.graphql Outdated Show resolved Hide resolved
cmd/repo-updater/repos/syncer.go Outdated Show resolved Hide resolved
cmd/repo-updater/repos/syncer.go Show resolved Hide resolved
cmd/repo-updater/repos/syncer.go Outdated Show resolved Hide resolved
cmd/repo-updater/repos/syncer.go Outdated Show resolved Hide resolved
cmd/repo-updater/repos/syncer.go Outdated Show resolved Hide resolved
cmd/repo-updater/repoupdater/server.go Outdated Show resolved Hide resolved
// produces an error through multiple syncs, the state of
// multiSourceErr would flip-flop between here and the `ListRepos` call
// further down.
s.SetOrResetMultiSourceErr(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github won't let me comment on it, but what if ListExternalServices fails? Maybe we should just do this at the layer we called sourced?

Copy link
Contributor Author

@mrnugget mrnugget Aug 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ListExternalServices won't return a SourceError/MultiSourceError, which means it doesn't come with an external service attached to it.

I agree, though, that we should display ListExternalServices errors and also UpsertRepos errors, for that matter, since the latter was involved in a a few customer issues in the past (unique constraint errors, etc.).

That would mean that we generalize and go from a multiSourceErr *MultiSourceError field on Syncer to a lastSyncErr error field (which is what I had in my first internal proof-of-concept version of this). On the caller side, in Server, we could then differentiate between a normal error (without external service!) and a MultiSourceError (with external service!) and produce two different status messages — SyncErrorStatusMessage and ListExternalServiceErrorStatusMessage.

What do you think? Should that be part of this PR?

Edit: the more I think about that, the more it makes sense to me:

  • change the syncing error status messages to have an optional (!) external service id field
  • turn the error field on Syncer into a error
  • whenever a call in Sync (not sourced!) produces an error (including the DB functions), set this error, abort sync
  • depending on whether or not the lastSyncError is a MultiSourceError with Sources or not, attach an external service id to the status message

Pending any objections, I'll hack on this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed this sounds good to me!

pkg/repoupdater/protocol/repoupdater.go Outdated Show resolved Hide resolved
pkg/repoupdater/protocol/repoupdater.go Outdated Show resolved Hide resolved
@felixfbecker
Copy link
Contributor

On the styling, the dividers that don't extend to the edges of the popover look odd to me. Also, why dividers if the alerts are each in a box? Are they trying to convey some semantic groups? If yes, it is no obvious...
Also noticed "1 repositories cloning" (wrong inflection)

@mrnugget
Copy link
Contributor Author

mrnugget commented Aug 22, 2019

On the styling, the dividers that don't extend to the edges of the popover look odd to me. Also, why dividers if the alerts are each in a box? Are they trying to convey some semantic groups? If yes, it is no obvious...

Agree. I was playing around with a few different styles but none looked really good to me. Let me try if I can whip some options together.

@felixfbecker
Copy link
Contributor

As long as this targets 3.8 and we're gonna polish it before the release, I'm fine with merging and following up on the design

@mrnugget
Copy link
Contributor Author

Just pushed some styling tweaks:

  • got rid of the divider
  • added a heading
  • added some more alert-* styling

When everything is cloned:

Screen Shot 2019-08-22 at 17 31 16

With a info ("Cloning...") and danger ("error syncing") status messages:

Screen Shot 2019-08-22 at 17 28 43

I'm happy with this, but @christinaforney has some ideas, so: feel free to play with this branch and let me know whether I can merge it tomorrow or not :)

@christinaforney
Copy link
Contributor

christinaforney commented Aug 23, 2019

Here is what I came up with for the styles. Let me know what you think! I was hacking the DOM a bit, so anywhere you see an emoji, please use a similar icon there instead.

I would like to see the icon indicating type of the notice:

The icon should lead the heading for proximity association, and visual representation. I used a border-left: 6px solid #color on the status-messages-nav-item__entry with the colors matching those of the alerts (see bootstrap). I like this use of color, because the text stays more readable, but still applies a visual queue of the notice.

Here is a warning in light/dark:
image
image

Success state in light/dark:
image
image

In progress and error states in light/dark:
image
image
image

@mrnugget
Copy link
Contributor Author

I've implemented @christinaforney's ideas.

Light theme

Screen Shot 2019-08-23 at 10 02 40

Screen Shot 2019-08-23 at 10 03 30

Dark theme

Screen Shot 2019-08-23 at 10 02 51

Screen Shot 2019-08-23 at 10 03 53


We've decided that we can still iterate on this in this iteration. If someone with more design chops comes up with something better: please speak up 😄

@christinaforney
Copy link
Contributor

Looks great! 😁

@felixfbecker
Copy link
Contributor

I don't think the "line on the side" style is a bad one, but it feels a bit inconsistent to me with how we represent alerts elsewhere. I am also missing a bit of visual separation between the items, which in this design, I would solve by using the same color slighty-transparent as a background for each item. But then it really would be just an alert design, but one that is entirely different from other styles. I usually see the "line on the side" style used in design systems that prefer sharp edges & corners over rounded ones. But our design uses rounded edges everywhere. You could make the line a div and add a border-radius of course, but then it would work less as an element that implies the edges of each item (you can't mentally extend the lines anymore to draw the boundaries).

@christinaforney is there a reason why you prefer this design over our existing alert styles? For example, if it is the "box in a box" that is bothering you, maybe we could try out extending the alerts to the edges.

👍 on icons though

@christinaforney
Copy link
Contributor

Really good questions! I don't think these should be considered alerts, instead they are notifications or a list of messages. Because of this I think they need a different style than the alerts. My issue with them was around readability of so many stacked colors, which is what got me to realize that they aren't actually alerts in the context of the drop down.

I agree about the rounded edges and don't have a good alternative - the design consistency with the rest of the product was the thing I was most worried about with the design I suggested.

Do you have any ideas to find a middle ground? What do you think of adding back the hr separation between items with the current design?

@felixfbecker
Copy link
Contributor

What do you think of my idea to extend the background to the sides? E.g. https://getbootstrap.com/docs/4.3/components/list-group/#contextual-classes

mrnugget added a commit that referenced this pull request Aug 23, 2019
mrnugget added a commit that referenced this pull request Aug 26, 2019
This is a follow-up to #5319 and implements the improved error reporting
as described in this comment:

https://github.com/sourcegraph/sourcegraph/pull/5319#discussion_r316627988

The user visible changes:

* The types of `StatusMessage`s have been changed so that there is now
  an `ExternalServiceSyncError` to which an external service is attached
  and the new `SyncError` without an external service
* The `SyncError` will now be produced whenever the syncing process in
  repo-updater fails, even if that doesn't involve an external service.
  That means we now, for example, display errors that originate in the
  database. That's something customer's ran into in the past but that
  was never visible.
mrnugget added a commit that referenced this pull request Aug 27, 2019
This is a follow-up to #5319 and implements the improved error reporting
as described in this comment:

https://github.com/sourcegraph/sourcegraph/pull/5319#discussion_r316627988

The user visible changes:

* The types of `StatusMessage`s have been changed so that there is now
  an `ExternalServiceSyncError` to which an external service is attached
  and the new `SyncError` without an external service
* The `SyncError` will now be produced whenever the syncing process in
  repo-updater fails, even if that doesn't involve an external service.
  That means we now, for example, display errors that originate in the
  database. That's something customer's ran into in the past but that
  was never visible.
mrnugget added a commit that referenced this pull request Aug 27, 2019
* status-indicator: Display all syncing errors from repo-updater

This is a follow-up to #5319 and implements the improved error reporting
as described in this comment:

https://github.com/sourcegraph/sourcegraph/pull/5319#discussion_r316627988

The user visible changes:

* The types of `StatusMessage`s have been changed so that there is now
  an `ExternalServiceSyncError` to which an external service is attached
  and the new `SyncError` without an external service
* The `SyncError` will now be produced whenever the syncing process in
  repo-updater fails, even if that doesn't involve an external service.
  That means we now, for example, display errors that originate in the
  database. That's something customer's ran into in the past but that
  was never visible.

* fixup! Make setOrResetLastSyncErr easier to understand

* fixup! graphql: Prefix statusmessage with `FOR INTERNAL USE`
Copy link
Contributor

@tsenart tsenart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very glad we have this now :-) 🚀

cmd/repo-updater/repos/sources.go Show resolved Hide resolved
cmd/repo-updater/repos/syncer.go Show resolved Hide resolved
cmd/repo-updater/repos/syncer.go Show resolved Hide resolved
@christinaforney
Copy link
Contributor

@felixfbecker:

What do you think of my idea to extend the background to the sides? E.g. https://getbootstrap.com/docs/4.3/components/list-group/#contextual-classes

I don't like all the text being in the colorful background. What do you think of this alternative (imagine the title colors going all the way to the edges). I think it adds a nice visual separation from the content and title and would layer nicely.

Screen Shot 2019-08-27 at 9 33 56 AM

Screen Shot 2019-08-27 at 9 35 36 AM

image

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants