-
Notifications
You must be signed in to change notification settings - Fork 440
Site Reliability
Here is how we ensure that our reference server https://build.opensuse.org functions reliably
In our Ruby on Rails app, we make use of lograge to log to disk. System logs go to a central logging server via rsyslog.
On our servers, we make use of icinga and many monitoring-plugins which send infrastructure performance and health monitoring data to an InfluxDB time series database, which we then visualize on a Grafana dashboard. This dashboard is not public.
More details in System Health Monitoring.
Inside our Ruby on Rails app, we make use of influxdb-rails which sends performance data to an InfluxDB time series database. We visualize this data on a Grafana dashboard reachable at https://obs-measure.opensuse.org
More details in Application Performance Monitoring.
Inside our Ruby on Rails app, we make use of bunny which sends telemetry to a RabbitMQ message broker, where a telegraf server agent reads the telemetry and stores it into a InfluxDB time series database. We visualize this data on a Grafana dashboard reachable at https://obs-measure.opensuse.org
More details in Application Health Monitoring.
Inside our Ruby on Rails app, we make use of airbrake which sends application exceptions to an errbit error catcher service at https://errbit.opensuse.org
We don't do analytics
We don't trace
There is always at least one person "on-call". As soon as we are alerted that person takes on the incident command and holds all positions (hacking on the problem, operating the server, communication to the users) that they have not delegated. They are free to pull in anyone they need and hand out tasks/roles to solve this incident.
After resolving the incident we do a root cause analysis and publish a report, based on our Post-Mortem-Template, on https://openbuildservice.org/categories/deployments/
We are using priority labels for issues.
- P1: Urgent - EVERYONE drop everything and fix this
- P2: High - If at all possible, assign this to you and fix it ASAP
You can run OBS and all the tools we use in our SRE stack in your development environment. To set up the stack run
rake docker:sre:build
This will fetch all images and configure them. Afterward you can issue any docker compose
command you would normally use by appending the docker-compose.sre.yml
file. So for instance to boot up OBS including the SRE stack you would use
docker compose -f docker-compose.sre.yml -f docker-compose.yml up
Go to Grafana frontend, http://0.0.0.0:8000, login (admin/admin) and import the 'influxdb-rails' sample dashboards (Overview, per Request, per Action) or export/import dashboards from obs-measure etc.
- Development Environment Overview
- Development Environment Tips & Tricks
- Spec-Tips
- Code Style
- Rubocop
- Testing with VCR
- Authentication
- Authorization
- Autocomplete
- BS Requests
- Events
- ProjectLog
- Notifications
- Feature Toggles
- Build Results
- Attrib classes
- Flags
- The BackendPackage Cache
- Maintenance classes
- Cloud uploader
- Delayed Jobs
- Staging Workflow
- StatusHistory
- OBS API
- Owner Search
- Search
- Links
- Distributions
- Repository
- Data Migrations
- next_rails
- Ruby Update
- Rails Profiling
- Installing a local LDAP-server
- Remote Pairing Setup Guide
- Factory Dashboard
- osc
- Setup an OBS Development Environment on macOS
- Run OpenQA smoketest locally
- Responsive Guidelines
- Importing database dumps
- Problem Statement & Solution
- Kickoff New Stuff
- New Swagger API doc
- Documentation and Communication
- GitHub Actions
- How to Introduce Software Design Patterns
- Query Objects
- Services
- View Components
- RFC: Core Components
- RFC: Decorator Pattern
- RFC: Backend models