Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more prometheus metrics #1054

Merged
merged 3 commits into from
Aug 29, 2024

Conversation

GeorgeTsagk
Copy link
Member

Description

This PR adds any missing metrics for the tapd monitoring package based on this comment

Specifically, in this PR we add:

  • Asset proof collector: gather a vector for all proof sizes, with script key as the identifier
  • DB collector: gathers db related data, currently only gathering the total db size

@GeorgeTsagk GeorgeTsagk self-assigned this Aug 1, 2024
@GeorgeTsagk GeorgeTsagk force-pushed the prom-loadtest-metrics branch from 15408e9 to c0561ec Compare August 5, 2024 14:09
@GeorgeTsagk GeorgeTsagk requested a review from guggero August 5, 2024 15:35
tapdb/assets_store.go Outdated Show resolved Hide resolved
tapdb/sqlc/queries/metadata.sql Outdated Show resolved Hide resolved
@@ -0,0 +1,2 @@
-- name: AssetsDBSize :one
SELECT pg_catalog.pg_database_size(current_database()) AS size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this query precise enough? What if the database is multi tenant and tapdb is just one of the users?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that it's built-in I trust it to a certain degree. It seems to be accounting for every byte at the time of execution, including unused allocated space, data marked as deleted etc

The query should return the same size for the db regardless of the user and what they have access to within the db

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that if we're also running lnd in the same postgres DB, will it return the total size, or just the size of the DB w/ the attached user?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current_database() would refer to the database/schema you're using. Assuming there would be a different one for lnd and taproot-assets, I think this should work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, I tested this locally where I have a single development timescale/postgres instance with different databases.
And I get distinct values for each of the following queries:

SELECT pg_catalog.pg_database_size('taprootassets') AS size;
--> 16802671

SELECT pg_catalog.pg_database_size('loop') AS size;
--> 8135535

SELECT pg_catalog.pg_database_size('pool') AS size;
--> 8135535

select current_database();
--> taprootassets

defer cancel()

// Fetch all proofs.
proofs, err := a.cfg.AssetStore.FetchAssetProofs(ctxdb)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the default scraping interval, I think this'll run every 60 seconds or so. Extrapolating out to say 100x the current scale, will this be fast enough?

We may want to consider a more precise query than what we have here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

If we want to stick with taking a snapshot of all asset proofs and keep track of them, we don't have anything better currently. I believe we can stick to this for now, and our loadtests will indicate the next step.

Also, if we want to keep the loadtest behavior "random" (i.e send a random asset X times) we can't really know which asset is going to be used, so it kind of makes sense to keep track of everything.

If we change the loadtest behavior so that specific assets are being sent around in specific patterns, then we can change the above query and only track what we're interested in

monitoring/asset_proof_collector.go Outdated Show resolved Hide resolved
monitoring/asset_proof_collector.go Outdated Show resolved Hide resolved
monitoring/asset_proof_collector.go Outdated Show resolved Hide resolved
@GeorgeTsagk GeorgeTsagk force-pushed the prom-loadtest-metrics branch from c0561ec to 5cabb7e Compare August 6, 2024 11:54
@GeorgeTsagk GeorgeTsagk requested a review from Roasbeef August 6, 2024 11:55
Copy link
Member

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. I agree that we should optimize the call that fetches the proof sizes and put it into a histogram. The rest seems good, will do a manual test once comments are addressed.

tapdb/assets_store.go Show resolved Hide resolved
tapcfg/server.go Show resolved Hide resolved
tapcfg/server.go Show resolved Hide resolved
tapdb/assets_store.go Outdated Show resolved Hide resolved
monitoring/asset_proof_collector.go Outdated Show resolved Hide resolved
@GeorgeTsagk GeorgeTsagk force-pushed the prom-loadtest-metrics branch from 5cabb7e to 9ed81e6 Compare August 8, 2024 12:46
@GeorgeTsagk GeorgeTsagk requested a review from guggero August 8, 2024 12:47
Copy link
Member

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of optimization suggestions. But this works, tested it by running the load test locally with one sqlite and one postgres backed instance.

tapdb/assets_store.go Outdated Show resolved Hide resolved
tapdb/assets_store.go Outdated Show resolved Hide resolved
tapdb/sqlc/queries/assets.sql Outdated Show resolved Hide resolved
monitoring/asset_proof_collector.go Outdated Show resolved Hide resolved
monitoring/db_collector.go Show resolved Hide resolved
@GeorgeTsagk GeorgeTsagk force-pushed the prom-loadtest-metrics branch from 9ed81e6 to 5a0a75c Compare August 9, 2024 11:55
@GeorgeTsagk GeorgeTsagk requested a review from guggero August 9, 2024 11:56
Copy link
Member

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK, LGTM 🎉

@dstadulis dstadulis added this to the v0.4.2 milestone Aug 13, 2024
@lightninglabs-deploy
Copy link

@Roasbeef: review reminder

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🧞‍♂️

@Roasbeef Roasbeef merged commit 0551a3f into lightninglabs:main Aug 29, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

5 participants