Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Expose log count metric from Runner #684

Merged
merged 2 commits into from
Apr 22, 2024
Merged

Conversation

morgsmccauley
Copy link
Collaborator

@morgsmccauley morgsmccauley commented Apr 22, 2024

Creates a new winston transport method to count logs by level, and exposes a new prometheus metric to record this value.

Additionally, metrics recorded on the main thread were not captured. This was not an issue as the majority of metrics were done within the worker. But since we also log on main thread, this PR updates metrics aggregation to expose main thread metrics.

@morgsmccauley morgsmccauley changed the title feat/count logs feat: Expose log count metric from Runner Apr 22, 2024
@morgsmccauley morgsmccauley marked this pull request as ready for review April 22, 2024 05:39
@morgsmccauley morgsmccauley requested a review from a team as a code owner April 22, 2024 05:39
@morgsmccauley morgsmccauley self-assigned this Apr 22, 2024
@@ -87,7 +94,9 @@ export const startServer = async (): Promise<void> => {
app.get('/metrics', async (_req, res) => {
res.set('Content-Type', aggregatorRegistry.contentType);

const metrics = await AggregatorRegistry.aggregate(Array.from(workerMetrics.values())).metrics();
const mainThreadMetrics = await register.getMetricsAsJSON();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What extra information does this provide? It seems we weren't getting this before.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I outlined this in the description - mainly just the log counts :)


const { format, transports } = winston;

class LogCounter extends Transport {
log (info: { level: string }, callback: () => void): void {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to count only .log calls? Or is this applied to all log calls (.warn, .error, etc.)? Another thing, it would be nice if we could similarly differentiate between error logs too for metrics. For example, it would be nice to distinguish between a single "Encountering error processing stream" error log and many "Error writing..." error calls. The former indicating the worker shut down whereas the latter could be any of these three: Hasura is down, our system queries are malformed, or an indexer is trying to run some faulty query.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could definitely add that, but it may be expensive given all the different log messages. My motive behind this was to be able to quickly see if we are erroring out from Grafana, alert from it, and then go in to GCP to see the actual issue/log messages.

@morgsmccauley morgsmccauley merged commit 328c704 into main Apr 22, 2024
3 checks passed
@morgsmccauley morgsmccauley deleted the feat/count-logs branch April 22, 2024 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants