Lead infrastructure (error) monitoring #4452

bedeho · 2022-11-18T16:01:06Z

Background

Currently, there is no way for leads, whether storage or distributor, to automatically learn about changes in the error frequency of various actors and services in the overall infrastructure. The closest tool to this is a ES powered hosted service operated by a senior community member which storage provider operates, but it is apparently not very well used, and requires manual intervention for providers to report to, or to update in case of failure of that endpoint.

Proposal

Create a new lead infrastructure service which monitors errors, satisyfing the following requirements

Argus, Colossus and Orion(v2) all report content related delivery related requests to this endpoint. In particular Orion will be reporting on behalf of errors experienced by clients on that gateway. We here are interested in all interactions, so
- Orion user to Colossus uploads
- Orion user to Argus downloads
- Colossus from Colossus downloads
- Argus from Colossus downloads
There is on-chain metadata which is publishable by lead, which sets where to resolve this endpoint at any given time. All nodes follow this, unless explicitly configured not to. API should report what endpoint the node is currently reporting to.
All error claims have to be done with identifier for reporter, along with attestation using signature for some appropriate role key of actor. Should be one of the metaprotocol level membership keys, so does not need to have funds.
All errors much use accurate on-chain identifiers of all data and actors being referenced, and time/block,so as to later make attribution easy, and also cross referencing across errors.
It should be possible to trigger hooks that send notifications to key actors over transports like email, sms or Discord, when certain global or actors specific error frequencies pass some adjustable thresholds.

The text was updated successfully, but these errors were encountered:

bedeho added colossus argus Argus distributor node labels Nov 19, 2022

bedeho mentioned this issue Nov 19, 2022

CIJTF Meeting Tracker Issue #4455

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lead infrastructure (error) monitoring #4452

Lead infrastructure (error) monitoring #4452

bedeho commented Nov 18, 2022 •

edited

Loading

Lead infrastructure (error) monitoring #4452

Lead infrastructure (error) monitoring #4452

Comments

bedeho commented Nov 18, 2022 • edited Loading

Background

Proposal

bedeho commented Nov 18, 2022 •

edited

Loading