-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(cyclotron): Add metrics everywhere #25193
base: master
Are you sure you want to change the base?
Conversation
This is ready for review but there's at least one open question about how labels are handled differently in rust and node land, that I'm simply not sure how to address - would appreciate input or suggestions if anyone's got them. |
plugin-server/src/cdp/utils.ts
Outdated
|
||
// The TS library seem to demand you declare labels up-front (for some reason?), | ||
// but we have know good way of knowing the set of labels used up-front, so | ||
// idk what to do here really - recreate the metric each time? That seems... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would need to be tested but it could be that the labels are just for typing so if you just cast the label setting as any
it might JustWork. Checked the lib source and it wasn't obvious either way if it would work though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rip... I'm gonna have to think about it a bit more than that it seems. It's easy to handle this with counter and gauges (I can just delete and recreate), but histograms are a bit trickier.
error: "Error: Added label \"outcome\" is not included in initial labelset: []
at validateLabel (/Users/olly/Documents/work/posthog/plugin-server/node_modules/.pnpm/[email protected]/node_modules/prom-client/lib/validation.js:20:10)
at Counter.inc (/Users/olly/Documents/work/posthog/plugin-server/node_modules/.pnpm/[email protected]/node_modules/prom-client/lib/counter.js:23:4)
at emitCyclotronMetrics (/Users/olly/Documents/work/posthog/plugin-server/src/cdp/utils.ts:408:21)
at CdpProcessedEventsConsumer.processBatch (/Users/olly/Documents/work/posthog/plugin-server/src/cdp/cdp-consumers.ts:408:29)
at CdpProcessedEventsConsumer._handleKafkaBatch (/Users/olly/Documents/work/posthog/plugin-server/src/cdp/cdp-consumers.ts:595:20)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at func (/Users/olly/Documents/work/posthog/plugin-server/src/cdp/cdp-consumers.ts:338:25)
at runInstrumentedFunction (/Users/olly/Documents/work/posthog/plugin-server/src/main/utils.ts:48:24)
at eachBatch (/Users/olly/Documents/work/posthog/plugin-server/src/cdp/cdp-consumers.ts:334:24)
at startConsuming (/Users/olly/Documents/work/posthog/plugin-server/src/kafka/batch-consumer.ts:305:17)
at Object.join (/Users/olly/Documents/work/posthog/plugin-server/src/kafka/batch-consumer.ts:385:13)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found the ever-classic 5 year old issue 🙃 siimon/prom-client#298
Putting this in draft for now, will return to it in a bit |
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
Problem
I really want to be able to answer questions like "how many bytes are being flushed to PG per second, per worker/per queue? How many updates are workers sending per flush (whats the flush batch size)?" and right now that's easy in rust land but hard in node land. A lot of the work here is laying the foundations for a stronger metrics reporting story in rust land generally (default labelling all metrics by service is a good start, we can maybe even stop prefixing metrics at some point), and then also getting rust's metrics story and node's to play nicely together.
Part of a broader push to get our delivery engine as easy to operate as possible - next on my list is a simple web UI exposed by janitors for looking at failed jobs, pausing cleanup, that kind of thing (particularly important as our binary blobs become more opaque, e.g. if we add compression, metabase becomes pretty useless) - but metrics first.