Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lead infrastructure (error) monitoring #4452

Open
bedeho opened this issue Nov 18, 2022 · 0 comments
Open

Lead infrastructure (error) monitoring #4452

bedeho opened this issue Nov 18, 2022 · 0 comments
Labels
argus Argus distributor node colossus

Comments

@bedeho
Copy link
Member

bedeho commented Nov 18, 2022

Background

Currently, there is no way for leads, whether storage or distributor, to automatically learn about changes in the error frequency of various actors and services in the overall infrastructure. The closest tool to this is a ES powered hosted service operated by a senior community member which storage provider operates, but it is apparently not very well used, and requires manual intervention for providers to report to, or to update in case of failure of that endpoint.

Proposal

Create a new lead infrastructure service which monitors errors, satisyfing the following requirements

  • Argus, Colossus and Orion(v2) all report content related delivery related requests to this endpoint. In particular Orion will be reporting on behalf of errors experienced by clients on that gateway. We here are interested in all interactions, so
    • Orion user to Colossus uploads
    • Orion user to Argus downloads
    • Colossus from Colossus downloads
    • Argus from Colossus downloads
  • There is on-chain metadata which is publishable by lead, which sets where to resolve this endpoint at any given time. All nodes follow this, unless explicitly configured not to. API should report what endpoint the node is currently reporting to.
  • All error claims have to be done with identifier for reporter, along with attestation using signature for some appropriate role key of actor. Should be one of the metaprotocol level membership keys, so does not need to have funds.
  • All errors much use accurate on-chain identifiers of all data and actors being referenced, and time/block,so as to later make attribution easy, and also cross referencing across errors.
  • It should be possible to trigger hooks that send notifications to key actors over transports like email, sms or Discord, when certain global or actors specific error frequencies pass some adjustable thresholds.
@bedeho bedeho added colossus argus Argus distributor node labels Nov 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
argus Argus distributor node colossus
Projects
None yet
Development

No branches or pull requests

1 participant