Netdata is distributed, real-time, performance and health monitoring for systems and applications. It is a highly-optimized monitoring agent you install on all your systems and containers.
Netdata provides unparalleled insights, in real-time, of everything happening on the systems it runs (including web servers, databases, applications), using highly interactive web dashboards. It can run autonomously, without any third-party components, or it can be integrated to existing monitoring toolchains (Prometheus, Graphite, OpenTSDB, Kafka, Grafana, and more).
Netdata is fast and efficient, designed to permanently run on all systems (physical & virtual servers, containers, IoT devices), without disrupting their core function.
Netdata is free, open-source software and it currently runs on Linux, FreeBSD, and MacOS, along with other systems derived from them, such as Kubernetes and Docker.
Netdata is not hosted by the CNCF but is the 3rd most starred open-source project in the Cloud Native Computing Foundation (CNCF) landscape.
People get addicted to Netdata. Once you use it on your systems, there is no going back! You've been warned...
- What does it look like? - Take a quick tour through the dashboard
- Our userbase - Enterprises we help monitor and our userbase
- Quickstart - How to try it now on your systems
- Why Netdata - Why people love Netdata and how it compares with other solutions
- News - The latest news about Netdata
- How Netdata works - A high-level diagram of how Netdata works
- Infographic - Everything about Netdata in a single graphic
- Features - How you'll use Netdata on your systems
- Visualization - Learn about visual anomaly detection
- What does it monitor? - See which apps/services Netdata auto-detects
- Documentation - Read the documentation
- Community - Discuss Netdata with others and get support
- License - Check Netdata's licencing
- Is it any good? - Yes.
- Is it awesome? - Yes.
The following animated GIF shows the top part of a typical Netdata dashboard.
A typical Netdata dashboard, in 1:1 timing. Charts can be panned by dragging them, zoomed in/out with
SHIFT
+mouse wheel
, an area can be selected for zoom-in withSHIFT
+mouse selection
. Netdata is highly interactive, real-time, and optimized to get the work done!
Want to see Netdata live? Check out any of our live demos.
Netdata is used by hundreds of thousands of users all over the world. Check our GitHub watchers list. You will find people working for Amazon, Atos, Baidu, Cisco Systems, Citrix, Deutsche Telekom, DigitalOcean, Elastic, EPAM Systems, Ericsson, Google, Groupon, Hortonworks, HP, Huawei, IBM, Microsoft, NewRelic, Nvidia, Red Hat, SAP, Selectel, TicketMaster, Vimeo, and many more!
We provide Docker images for the most common architectures. These are statistics reported by Docker Hub:
When you install multiple Netdata, they are integrated into one distributed application, via a Netdata registry. This is a web browser feature and it allows us to count the number of unique users and unique Netdata servers installed. The following information comes from the global public Netdata registry we run:
To install Netdata from source on any Linux system (physical, virtual, container, IoT, edge) and keep it up to date with our nightly releases automatically, run the following:
# make sure you run `bash` for your shell
bash
# install Netdata directly from GitHub source
bash <(curl -Ss https://my-netdata.io/kickstart.sh)
To learn more about the pros and cons of using nightly vs. stable releases, see our notice about the two options.
The above command will:
- Install any required packages on your system (it will ask you to confirm before doing so)
- Compile it, install it, and start it.
More installation methods and additional options can be found at the installation page.
To try Netdata in a docker container, run this:
docker run -d --name=netdata \
-p 19999:19999 \
-v /etc/passwd:/host/etc/passwd:ro \
-v /etc/group:/host/etc/group:ro \
-v /proc:/host/proc:ro \
-v /sys:/host/sys:ro \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
--cap-add SYS_PTRACE \
--security-opt apparmor=unconfined \
netdata/netdata
For more information about running Netdata with Docker, check the docker installation page.
From Netdata v1.12 and above, anonymous usage information is collected by default and sent to Google Analytics. To read more about the information collected and how to opt-out, check the anonymous statistics page.
Netdata has a quite different approach to monitoring.
Netdata is a monitoring agent you install on all your systems. It is:
- A metrics collector for system and application metrics (including web servers, databases, containers, and much more),
- A custom database engine to store recent metrics in memory and "spill" historical metrics to disk for efficient long-term storage,
- A super fast, interactive, and modern metrics visualizer optimized for anomaly detection,
- And an alarms notification engine - an advanced watchdog for detecting performance and availability issues
All the above, are packaged together in a very flexible, extremely modular, distributed application.
This is how Netdata compares to other monitoring solutions:
Netdata | others (open-source and commercial) |
---|---|
High resolution metrics (1s granularity) | Low resolution metrics (10s granularity at best) |
Monitors everything, thousands of metrics per node | Monitor just a few metrics |
UI is super fast, optimized for anomaly detection | UI is good for just an abstract view |
Meaningful presentation, to help you understand the metrics | You have to know the metrics before you start |
Install and get results immediately | Long preparation is required to get any useful results |
Use it for troubleshooting performance problems | Use them to get statistics of past performance |
Kills the console for tracing performance issues | The console is always required for troubleshooting |
Requires zero dedicated resources | Require large dedicated resources |
Netdata is open-source, free, super fast, very easy, completely open, extremely efficient, flexible and integrate-able.
It has been designed by system administrators, DevOps engineers, and developers for to not just visualize metrics, but also troubleshoot complex performance problems.
Nov 27th, 2019
- Netdata v1.19.0 released!
Release v1.19.0 contains 2 new collectors, 19 bug fixes, 17 improvements, and 19 documentation updates.
We completed a major rewrite of our web log collector to dramatically improve its flexibility and performance. The new collector, written entirely in Go, can parse and chart logs from Nginx and Apache servers, and combines numerous improvements. Netdata now supports the LTSV log format, creates charts for TLS and cipher usage, and is amazingly fast. In a test using SSD storage, the collector parsed the logs for 200,000 requests in about 200ms, using 30% of a single core.
This Go-based collector also has powerful custom log parsing capabilities, which means we're one step closer to a generic application log parser for Netdata. We're continuing to work on this parser to support more application log formatting in the future.
We have a new tutorial on enabling the Go web log collector and using it with Nginx and/or Apache access logs with minimal configuration. Thanks to Wing924 for starting the Go rewrite!
We introduced more cmocka unit testing to Netdata. In this release, we're testing how Netdata's internal web server processes HTTP requests—the first step to improve the quality of code throughout, reduce bugs, and make refactoring easier. We wanted to validate the web server's behavior but needed to build a layer of parametric testing on top of the CMocka test runner. Read all about our process of testing and selecting cmocka on our blog post: Building an agile team's 'safety harness' with cmocka and FOSS.
Netdata's Unbound collector was also completely rewritten in Go to improve how it collects and displays metrics. This new version can get dozens of metrics, including details on queries, cache, uptime, and even show per-thread metrics. See our tutorial on enabling the new collector via Netdata's amazing auto-detection feature.
We fixed an error where invalid spikes appeared on certain charts by improving the incremental counter reset/wraparound detection algorithm.
Netdata can now send health alarm notifications to IRC channels thanks to Strykar!
And, Netdata can now monitor AM2320 sensors, thanks to hard work from Tom Buck.
See more news and previous releases at our blog or our releases page.
Netdata is a highly efficient, highly modular, metrics management engine. Its lockless design makes it ideal for concurrent operations on the metrics.
This is how it works:
Function | Description | Documentation |
---|---|---|
Collect | Multiple independent data collection workers are collecting metrics from their sources using the optimal protocol for each application and push the metrics to the database. Each data collection worker has lockless write access to the metrics it collects. | collectors |
Store | Metrics are first stored in RAM in a custom database engine that then "spills" historical metrics to disk for efficient long-term metrics storage. | database |
Check | A lockless independent watchdog is evaluating health checks on the collected metrics, triggers alarms, maintains a health transaction log and dispatches alarm notifications. | health |
Stream | A lockless independent worker is streaming metrics, in full detail and in real-time, to remote Netdata servers, as soon as they are collected. | streaming |
Archive | A lockless independent worker is down-sampling the metrics and pushes them to backend time-series databases. | backends |
Query | Multiple independent workers are attached to the internal web server, servicing API requests, including data queries. | web/api |
The result is a highly efficient, low-latency system, supporting multiple readers and one writer on each metric.
This is a high level overview of Netdata feature set and architecture. Click it to to interact with it (it has direct links to our documentation).
This is what you should expect from Netdata:
- 1s granularity - The highest possible resolution for all metrics.
- Unlimited metrics - Netdata collects all the available metrics—the more, the better.
- 1% CPU utilization of a single core - It's unbelievably optimized.
- A few MB of RAM - The low-memory round-robin option uses 25MB RAM, and you can resize it.
- Minimal disk I/O - While running, Netdata only writes historical metrics and reads
error
andaccess
logs. - Zero configuration - Netdata auto-detects everything, and can collect up to 10,000 metrics per server out of the box.
- Zero maintenance - You just run it. Netdata does the rest.
- Zero dependencies - Netdata runs a custom web server for its static web files and its web API (though its plugins may require additional libraries, depending on the applications monitored).
- Scales to infinity - You can install it on all your servers, containers, VMs, and IoT devices. Metrics are not centralized by default, so there is no limit.
- Several operating modes - Autonomous host monitoring (the default), headless data collector, forwarding proxy, store and forward proxy, central multi-host monitoring, in all possible configurations. Each node may have different metrics retention policies and run with or without health monitoring.
- Sophisticated alerting - Netdata comes with hundreds of alarms out of the box! It supports dynamic thresholds, hysteresis, alarm templates, multiple role-based notification methods, and more.
- Notifications: alerta.io, amazon sns, discordapp.com, email, flock.com, hangouts, irc, kavenegar.com, messagebird.com, pagerduty.com, prowl, pushbullet.com, pushover.net, rocket.chat, slack.com, smstools3, syslog, telegram.org, twilio.com, web and custom notifications.
- Time-series databases - Netdata can archive its metrics to Graphite, OpenTSDB, Prometheus, AWS Kinesis, MongoDB, JSON document DBs, in the same or lower resolution (lower: to prevent it from congesting these servers due to the amount of data collected). Netdata also supports Prometheus remote write API, which allows storing metrics to Elasticsearch, Gnocchi, InfluxDB, Kafka, PostgreSQL/TimescaleDB, Splunk, VictoriaMetrics and a lot of other storage providers.
- Stunning interactive dashboards - Our dashboard is mouse-, touchpad-, and touch-screen friendly in 2 themes:
slate
(dark) andwhite
. - Amazingly fast visualization - Even on low-end hardware, the dashboard responds to all queries in less than 1 ms per metric.
- Visual anomaly detection - Our UI/UX emphasizes the relationships between charts so you can better detect anomalies visually.
- Embeddable - Charts can be embedded on your web pages, wikis and blogs. You can even use Atlassian's Confluence as a monitoring dashboard.
- Customizable - You can build custom dashboards using simple HTML. No JavaScript needed!
To improve clarity on charts, Netdata dashboards present positive values for metrics representing read
, input
, inbound
, received
and negative values for metrics representing write
, output
, outbound
, sent
.
Netdata charts showing the bandwidth and packets of a network interface. received
is positive and sent
is negative.
Netdata charts automatically zoom vertically, to visualize the variation of each metric within the visible time-frame.
A zero-based stacked
chart, automatically switches to an auto-scaled area
chart when a single dimension is selected.
Charts on Netdata dashboards are synchronized to each other. There is no master chart. Any chart can be panned or zoomed at any time, and all other charts will follow.
Charts are panned by dragging them with the mouse. Charts can be zoomed in/out withSHIFT
+ mouse wheel
while the mouse pointer is over a chart.
The visible time-frame (pan and zoom) is propagated from Netdata server to Netdata server when navigating via the node menu.
To improve visual anomaly detection across charts, the user can highlight a time-frame (by pressing Alt
+ mouse selection
) on all charts.
A highlighted time-frame can be given by pressing Alt
+ mouse selection
on any chart. Netdata will highlight the same range on all charts.
Highlighted ranges are propagated from Netdata server to Netdata server, when navigating via the node menu.
Netdata data collection is extensible. You can monitor anything you can get a metric for. Our plugin API supports a variety of programming languages to make nearly anything a Netdata plugin: Go, Python, Node.js, Ruby, Java, Bash, Perl, and more!
- For better performance, most system-related plugins (CPU, memory, disks, filesystems, networking, etc) have been written in C.
- For faster development and easier contributions, most application related plugins (databases, web servers, etc) have been written in Go and Python.
- statsd - Netdata is a fully featured statsd server.
- Go expvar - collects metrics exposed by applications written in the Go programming language using the expvar package.
- Spring Boot - monitors running Java Spring Boot applications that expose their metrics with the use of the Spring Boot Actuator included in Spring Boot library.
- uWSGI - collects performance metrics from uWSGI applications.
- CPU Utilization - total and per core CPU usage.
- Interrupts - total and per core CPU interrupts.
- SoftIRQs - total and per core SoftIRQs.
- SoftNet - total and per core SoftIRQs related to network activity.
- CPU Throttling - collects per core CPU throttling.
- CPU Frequency - collects the current CPU frequency.
- CPU Idle - collects the time spent per processor state.
- IdleJitter - measures CPU latency.
- Entropy - random numbers pool, using in cryptography.
- Interprocess Communication - IPC - such as semaphores and semaphores arrays.
- ram - collects info about RAM usage.
- swap - collects info about swap memory usage.
- available memory - collects the amount of RAM available for userspace processes.
- committed memory - collects the amount of RAM committed to userspace processes.
- Page Faults - collects the system page faults (major and minor).
- writeback memory - collects the system dirty memory and writeback activity.
- huge pages - collects the amount of RAM used for huge pages.
- KSM - collects info about Kernel Same Merging (memory dedupper).
- Numa - collects Numa info on systems that support it.
- slab - collects info about the Linux kernel memory usage.
- block devices - per disk: I/O, operations, backlog, utilization, space, etc.
- BCACHE - detailed performance of SSD caching devices.
- DiskSpace - monitors disk space usage.
- mdstat - software RAID.
- hddtemp - disk temperatures.
- smartd - disk S.M.A.R.T. values.
- device mapper - naming disks.
- Veritas Volume Manager - naming disks.
- megacli - adapter, physical drives and battery stats.
- adaptec_raid - logical and physical devices health metrics.
- ioping - to measure disk read/write latency.
- BTRFS - detailed disk space allocation and usage.
- Ceph - OSD usage, Pool usage, number of objects, etc.
- NFS file servers and clients - NFS v2, v3, v4: I/O, cache, read ahead, RPC calls
- Samba - performance metrics of Samba SMB2 file sharing.
- ZFS - detailed performance and resource usage.
- Network Stack - everything about the networking stack (both IPv4 and IPv6 for all protocols: TCP, UDP, SCTP, UDPLite, ICMP, Multicast, Broadcast, etc), and all network interfaces (per interface: bandwidth, packets, errors, drops).
- Netfilter - everything about the netfilter connection tracker.
- SynProxy - collects performance data about the linux SYNPROXY (DDoS).
- NFacct - collects accounting data from iptables.
- Network QoS - the only tool that visualizes network
tc
classes in real-time. - FPing - to measure latency and packet loss between any number of hosts.
- ISC dhcpd - pools utilization, leases, etc.
- AP - collects Linux access point performance data (
hostapd
). - SNMP - SNMP devices can be monitored too (although you will need to configure these).
- port_check - checks TCP ports for availability and response time.
- OpenVPN - collects status per tunnel.
- LibreSwan - collects metrics per IPSEC tunnel.
- Tor - collects Tor traffic statistics.
- System Processes - running, blocked, forks, active.
- Applications - by grouping the process tree and reporting CPU, memory, disk reads, disk writes, swap, threads, pipes, sockets - per process group.
- systemd - monitors systemd services using CGROUPS.
- Users and User Groups resource usage - by summarizing the process tree per user and group, reporting: CPU, memory, disk reads, disk writes, swap, threads, pipes, sockets.
- logind - collects sessions, users and seats connected.
- Containers - collects resource usage for all kinds of containers, using CGROUPS (systemd-nspawn, lxc, lxd, docker, kubernetes, etc).
- libvirt VMs - collects resource usage for all kinds of VMs, using CGROUPS.
- dockerd - collects docker health metrics.
- Apache and lighttpd -
mod-status
(v2.2, v2.4) and cache log statistics, for multiple servers. - IPFS - bandwidth, peers.
- LiteSpeed - reads the litespeed rtreport files to collect metrics.
- Nginx -
stub-status
, for multiple servers. - Nginx+ - connects to multiple nginx_plus servers (local or remote) to collect real-time performance metrics.
- PHP-FPM - multiple instances, each reporting connections, requests, performance, etc.
- Tomcat - accesses, threads, free memory, volume, etc.
- web server
access.log
files - extracting in real-time, web server and proxy performance metrics and applying several health checks, etc. - HTTP check - checks one or more web servers for HTTP status code and returned content.
- HAproxy - bandwidth, sessions, backends, etc.
- Squid - multiple servers, each showing: clients bandwidth and requests, servers bandwidth and requests.
- Traefik - connects to multiple traefik instances (local or remote) to collect API metrics (response status code, response time, average response time and server uptime).
- Varnish - threads, sessions, hits, objects, backends, etc.
- IPVS - collects metrics from the Linux IPVS load balancer.
- CouchDB - reads/writes, request methods, status codes, tasks, replication, per-db, etc.
- MemCached - multiple servers, each showing: bandwidth, connections, items, etc.
- MongoDB - operations, clients, transactions, cursors, connections, asserts, locks, etc.
- MySQL and mariadb - multiple servers, each showing: bandwidth, queries/s, handlers, locks, issues, tmp operations, connections, binlog metrics, threads, innodb metrics, and more.
- PostgreSQL - multiple servers, each showing: per database statistics (connections, tuples read - written - returned, transactions, locks), backend processes, indexes, tables, write ahead, background writer and more.
- Proxy SQL - collects Proxy SQL backend and frontend performance metrics.
- Redis - multiple servers, each showing: operations, hit rate, memory, keys, clients, slaves.
- RethinkDB - connects to multiple rethinkdb servers (local or remote) to collect real-time metrics.
- beanstalkd - global and per tube monitoring.
- RabbitMQ - performance and health metrics.
- ElasticSearch - search and index performance, latency, timings, cluster statistics, threads statistics, etc.
- bind_rndc - parses
named.stats
dump file to collect real-time performance metrics. All versions of bind after 9.6 are supported. - dnsdist - performance and health metrics.
- ISC Bind (named) - multiple servers, each showing: clients, requests, queries, updates, failures and several per view metrics. All versions of bind after 9.9.10 are supported.
- NSD - queries, zones, protocols, query types, transfers, etc.
- PowerDNS - queries, answers, cache, latency, etc.
- unbound - performance and resource usage metrics.
- dns_query_time - DNS query time statistics.
- chrony - uses the
chronyc
command to collect chrony statistics (Frequency, Last offset, RMS offset, Residual freq, Root delay, Root dispersion, Skew, System time). - ntpd - connects to multiple ntpd servers (local or remote) to provide statistics of system variables and optional also peer variables.
- Dovecot - POP3/IMAP servers.
- Exim - message queue (emails queued).
- Postfix - message queue (entries, size).
- IPMI - enterprise hardware sensors and events.
- lm-sensors - temperature, voltage, fans, power, humidity, etc.
- Nvidia - collects information for Nvidia GPUs.
- RPi - Raspberry Pi temperature sensors.
- w1sensor - collects data from connected 1-Wire sensors.
- apcupsd - load, charge, battery voltage, temperature, utility metrics, output metrics.
- NUT - load, charge, battery voltage, temperature, utility metrics, output metrics.
- Linux Power Supply - collects metrics reported by power supply drivers on Linux.
- RetroShare - connects to multiple retroshare servers (local or remote) to collect real-time performance metrics.
- Fail2Ban - monitors the fail2ban log file to check all bans for all active jails.
- FreeRadius - uses the
radclient
command to provide freeradius statistics (authentication, accounting, proxy-authentication, proxy-accounting).
- opensips - connects to an opensips server (localhost only) to collect real-time performance metrics.
- SMA webbox - connects to multiple remote SMA webboxes to collect real-time performance metrics of the photovoltaic (solar) power generation.
- Fronius - connects to multiple remote Fronius Symo servers to collect real-time performance metrics of the photovoltaic (solar) power generation.
- StiebelEltron - collects the temperatures and other metrics from your Stiebel Eltron heating system using their Internet Service Gateway (ISG web).
- SpigotMC - monitors Spigot Minecraft server ticks per second and number of online players using the Minecraft remote console.
- BOINC - monitors task states for local and remote BOINC client software using the remote GUI RPC interface. Also provides alarms for a handful of error conditions.
- IceCast - collects the number of listeners for active sources.
- Monit - collects metrics about monit targets (filesystems, applications, networks).
- Puppet - connects to multiple Puppet Server and Puppet DB instances (local or remote) to collect real-time status metrics.
You can easily extend Netdata, by writing plugins that collect data from any source, using any computer language.
The Netdata documentation is at https://docs.netdata.cloud, but you can also find each page inside of Netdata's
repository itself in Markdown (.md
) files. You can find all our documentation by navigating the repository.
Here is a quick list of notable documents:
Directory | Description |
---|---|
installer |
Instructions to install Netdata on your systems. |
docker |
Instructions to install Netdata using docker. |
daemon |
Information about the Netdata daemon and its configuration. |
collectors |
Information about data collection plugins. |
health |
How Netdata's health monitoring works, how to create your own alarms and how to configure alarm notification methods. |
streaming |
How to build hierarchies of Netdata servers, by streaming metrics between them. |
backends |
Long term archiving of metrics to industry-standard time-series databases, like prometheus , graphite , opentsdb . |
web/api |
Learn how to query the Netdata API and the queries it supports. |
web/api/badges |
Learn how to generate badges (SVG images) from live data. |
web/gui/custom |
Learn how to create custom Netdata dashboards. |
web/gui/confluence |
Learn how to create Netdata dashboards on Atlassian's Confluence. |
You can also check all the other directories. Most of them have plenty of documentation.
We welcome contributions. Feel free to join the team!
To report bugs or get help, use GitHub's issues.
You can also find Netdata on:
Netdata is GPLv3+.
Netdata re-distributes other open-source tools and libraries. Please check the third party licenses.
Yes.
When people first hear about a new product, they frequently ask if it is any good. A Hacker News user remarked:
Note to self: Starting immediately, all raganwald projects will have a “Is it any good?” section in the readme, and the answer shall be “yes.".
So, we follow the tradition...
These people seem to like it.