Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring, metrics and alerting #2025

Open
2 tasks
ScottGuymer opened this issue May 9, 2022 · 5 comments
Open
2 tasks

Monitoring, metrics and alerting #2025

ScottGuymer opened this issue May 9, 2022 · 5 comments
Labels
enhancement New feature or request stale:exempt

Comments

@ScottGuymer
Copy link
Member

ScottGuymer commented May 9, 2022

We are seeing increasing interest and need for a solution to help monitor the resources deployed by this module.

There have already been a number of interactions around this

And some PRs

This is also something we are interested on at Philips and want to create a solution that is useful for the community where needed.

We will share more info on this as we align further.

@ScottGuymer ScottGuymer added the enhancement New feature or request label May 9, 2022
@npalm npalm pinned this issue May 9, 2022
@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale label Jun 11, 2022
@npalm npalm reopened this Jul 11, 2022
@npalm npalm added stale:exempt and removed Stale labels Jul 11, 2022
@npalm npalm unpinned this issue Sep 24, 2022
@razor54
Copy link

razor54 commented Nov 23, 2022

In addition to this, would cpu and memory monitoring and alerting through cloudwatch be of interest?

@sgametrio
Copy link

The possibility of having metrics (alerts too?) would be so cool! 👍

@mackobi
Copy link

mackobi commented Jul 9, 2024

Guys, do you expect soon get this implemented into main?

@dgokcin
Copy link

dgokcin commented Oct 24, 2024

@razor54 just being able to see memory metrics would be enough for me. ATM, I am having an issue with the runners staying hanged randomly and the deafult metrics for ec2 instances is not helping at all for me to troubleshoot the issue. I tried configuring the cloudwatch agent by creating my own template file but it did not work. Any tips?

{
    "agent": {
        "metrics_collection_interval": 5
    },
    "metrics": {
        "metrics_collected": {
            "swap": {
                "measurement": [
                    "swap_used_percent",
                    "swap_used",
                    "swap_free"
                ]
            },
            "mem": {
                "measurement": [
                    "mem_cached",
                    "mem_total",
                    "mem_used"
                ]
            }
        }
    },
    "logs": {
        "logs_collected": {
            "files": {
                "collect_list": ${logfiles}
            }
        }
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale:exempt
Projects
None yet
Development

No branches or pull requests

6 participants