Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASB Metrics #448

Closed
bonomat opened this issue Apr 26, 2021 · 9 comments · Fixed by #474
Closed

ASB Metrics #448

bonomat opened this issue Apr 26, 2021 · 9 comments · Fixed by #474
Assignees

Comments

@bonomat
Copy link
Member

bonomat commented Apr 26, 2021

As a maker I would like a cli command to see for all swaps:

  • when they were started
  • amounts being swapped (incl rate at start time)
  • date at each step (fund/redeem/refund) including rate
  • taker's id

Hint: currently old states are overwritten in the DB. We could just add a new DB entry for each event including details.

@thomaseizinger
Copy link
Contributor

thomaseizinger commented May 1, 2021

Rust-libp2p is exploring OpenMetrics: libp2p/rust-libp2p#2063

Don't have much experience but this OpenMetrics sounds like a useful standard that we could follow here where we don't have to reinvent the wheel and make it easy for the user to later tap into a wider ecosystem of monitoring.

@bonomat bonomat self-assigned this May 4, 2021
@bonomat
Copy link
Member Author

bonomat commented May 4, 2021

OpenMetrics (same for Prometheus) does not seem to be suitable for our needs.

It is designed to monitor services from a data perspective:

"Metrics are a specific kind of telemetry data. They represent a snapshot of the current state for a set of data. "

Examples range from CPU utilization and I/O load to service response times.

The continue with :

"They are distinct from logs or events, which focus on records or information about individual events."

Source

The latter one is more what we need, i.e. record and collect events.

It gets more clear that OpenMetrics does not fit when reading through the defined metric types.

It defines:

  • Gauge: for current measurements such as bytes of memory currently used or number of items in a queue.
  • Counter: to measure discrete events, such as number of HTTP requests, CPU seconds spent, bytes sent.
  • StateSet: for a series of related boolean values. Further down in the document. The idea is to record states by only using boolean values, i.e. the boolean indicates if a certain state is enabled .
  • Info: sounds suitable but is meant for textual information which "SHOULD NOT" change during process lifetime.
  • Histogram: for discrete events
  • Summary: similar to histogram but a bit more flexible. Example can be found here.

@bonomat
Copy link
Member Author

bonomat commented May 4, 2021

I propose to follow a store-and-log approach instead:

  1. store interesting events in the db (create a new tree for metrics).
  2. add a new command which reads all (or only events for a specific swap-id) and prints it to stdout.
    a) this allows us to decide the format for the events later-on.

@bonomat
Copy link
Member Author

bonomat commented May 4, 2021

What I would like to see (the format of the log does not matter atm):

SwapDetail: 
{
	`swap-id`: `Id`,
	'states` : `[State]`, 
	`btc-amount`: `Number`,
	`xmr-amount`: `Number`,
	`counter-party`: `PerrId`
}

State: 
{
    `recorded`: `Timestamp` // when this state was recorded
	`rate`: `Number` // what was the exchange rate at this point
	`tx-id`: `TxId` // if any
}

@bonomat bonomat removed their assignment May 4, 2021
@da-kami
Copy link
Member

da-kami commented May 4, 2021

In general:

Yeah, what you describe has nothing to do with Metrics but more with Events and just swap-state - once we aggregate these Events and analyze them we have actual Metrics.

To be more concrete: What you outline above as Metric should be named Swaps of SwapDetails I think:

Swaps: 
{
	`swap-id`: `Id`,
	'states` : `[State]`, 
	`btc-amount`: `Number`,
	`xmr-amount`: `Number`,
	`counter-party`: `PerrId`
}

Only once we analyze multiple of these swaps and we would actually have Metrics as in "5 out of 10 swaps finished successfully with state ...".

I am fine with recording more swap details in the Sled DB in separate trees for a "quick solution" - but if we plan for the long run it would be better to do what I outline below.


I had the alternative idea to improve the logs and let an external software handle our "Metrics" through the logs. We already use swap and peer-id contexts within the logging and could easily extract swap / peer information using log frameworks such as vector

Advantages:

  • We don't clutter our application code with recording events that don't serve any purpose other than log-details
  • We improve our logs rather than adding additional "events" on top
  • We can easily visualize data (UI) by pumping it into other software - Vector supports various sinks - could start with files to keep it simple (as source we can directly plug into journal)
  • We can define transformation rules without having to touch application code and release...

Disadvantage:

  • We have to set it up and get into Vector. (it looks fairly straight forward, I started playing with it but don't have a working example yet...)
  • It does not work "out of the box" - i.e. other ASB providers would have to set e.g. Vector up themselves. We could provide documentation , but it it somewhat of a higher entry burden.

@thomaseizinger
Copy link
Contributor

Sad that OpenMetrics doesn't support more complex things that than. The big advantage would have been that we don't need to store this data. An ever-growing database is not particularly ideal and metrics / logs are things that you usually don't care about past a certain point in time.

If we store things in a database, can we use this an opportunity to start migrating to SQLite? The actual extraction of metrics could then be as simple as loading the database into any SQL tool and running a couple of queries against it.

That should reduce the required development effort significantly. Also, if we create a separate reporting database, deleting that one is safe if the user ever wants to clean up storage space.

@bonomat
Copy link
Member Author

bonomat commented May 4, 2021

Yeah, what you describe has nothing to do with Metrics but more with Events and just swap-state - once we aggregate these Events and analyze them we have actual Metrics.

That's a good summary :)

I had a quick look into Vector as well. It's a log analyzer, maybe comparable to LogStash:

Vector is a high-performance observability data pipeline that allows you to collect, transform, and route all your logs and metrics.

It even has an export feature to send data to prometheus (which afaik is in OpenMetrics format).

If we want to go this way, we can ignore the DB and print more details into the logs.

@thomaseizinger : what do you think?

@da-kami
Copy link
Member

da-kami commented May 4, 2021

Yeah, what you describe has nothing to do with Metrics but more with Events and just swap-state - once we aggregate these Events and analyze them we have actual Metrics.

That's a good summary :)

I had a quick look into Vector as well. It's a log analyzer, maybe comparable to LogStash:

Vector is a high-performance observability data pipeline that allows you to collect, transform, and route all your logs and metrics.

It even has an export feature to send data to prometheus (which afaik is in OpenMetrics format).

If we want to go this way, we can ignore the DB and print more details on the logs.

@thomaseizinger : what do you think?

I think it boils down to either using a relational DB and then use some tools on top of the relational DB for analyzis OR ignore the DB and go with the logs. I am not sure what is the better approach, but I think the log approach would give us faster results.
Going DB would be a good thing though, because it will help us with refactoring the current DB eventually, which is old tech debt.

@bonomat
Copy link
Member Author

bonomat commented May 4, 2021

After quick chat with @da-kami:

  1. Analyzing and coming up with Metrics for the ASB is in the role of the ASB provider and hence we can see this as a nice-to-have feature.
  2. It is hard to get the metrics right for everyone hence we should strive for a flexible solution.
  3. A new DB (or sled-tree) would add additional complexity because we would need to think of an upgradable or flexible data-schema. Additionally, it was mentioned that SQL should replace Sled eventually. We should not jump the gun on this topic because we want some more information to analyze.

Conclusion: Because of 1. + 2. best is to add more information into our logs. These can then be analyzed using tools like Vector or LogStash+Elastic or other tools.

@bonomat bonomat self-assigned this May 6, 2021
bors bot added a commit that referenced this issue May 11, 2021
474: Add more log details r=bonomat a=bonomat

Resolves #448 

1. The first commit adds an additional log statement of the exchange rate for each state-update. This is useful because it allows us to measure profitability easily, i.e. by knowing what was the exchange rate when the swap was started and what was it when it was finalized.
2. The second commit changes a bunch of log messages. 
3. The third commit is adds a new commandline flag to toggle json format.




Co-authored-by: Philipp Hoenisch <[email protected]>
Co-authored-by: Philipp Hoenisch <[email protected]>
@bors bors bot closed this as completed in f03e8fa May 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants