-
Notifications
You must be signed in to change notification settings - Fork 0
Collects, stores and examines network time series data
License
wandnz/nntsc
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
NNTSC 2.28 --------------------------------------------------------------------------- Copyright (c) 2016, 2017 The University of Waikato, Hamilton, New Zealand. All rights reserved. This code has been developed by the University of Waikato WAND research group. For further information please see http://www.wand.net.nz/. --------------------------------------------------------------------------- NNTSC is a system designed for collecting, storing and examining network time series data. The main feature of NNTSC is that it is flexible and can work with just about time series data, so you can unify your measurement storage and analysis systems around a single data management API. NNTSC also provides a client API that allows other programs to connect to the measurement database and either ask for historical data or register interest in upcoming live data. If you have any questions, comments or suggestions about NNTSC, please email us at [email protected]. Quickstart Instructions for Debian Users ======================================== 1. Install the NNTSC Debian package 2. Create a Postgresql database for NNTSC: createdb nntsc 3. Create an Influx database for NNTSC: influx -execute 'CREATE DATABASE nntsc' 4. Configure NNTSC (see 'Configuration' section below for more details) 4.1. Create an RRD List file (if needed). 4.2. Edit /etc/nntsc/nntsc.conf to point to your new database and RRD list. 5. Run: build_nntsc_db -C /etc/nntsc/nntsc.conf 6. Run: /etc/init.d/nntsc start Installation from Source ======================== NNTSC uses Python setuptools to install both itself and its dependencies. In version 2.3 there has been a significant change to the installation process -- some of the NNTSC dependencies are now distributed and installed separately, namely libnntscclient. If you have checked out a copy of NNTSC from our git repository, you will also need to get libnntscclient from the repository as well. Install it by changing into the directory and running 'python setup.py install' prior to attempting to install NNTSC itself. Once these are installed, simply run the following to install NNTSC: python setup.py install By default, NNTSC installs into your system's dist-packages directory but you will need root privileges to install it there. You can change the installation location using the --prefix option. Run 'python setup.py --help install' for more details about how you can customise the install process. Supported Network Measurement Data Sources ========================================== At present, NNTSC only supports a handful of network measurement methods, but we are working to extend this as time goes on. NNTSC has support for the following data sources: AMP -- the Active Measurement Project Smokeping RRDs Glossary ======== Here are a few important terms that we use when talking about NNTSC. collection: A collection is a group of network measurement time series' (which we call streams) which use the same measurement approach and result structure. For example, each AMP test would result in a separate collection. dataparser: The Python code that processes incoming network measurements and inserts them into the NNTSC database. Each collection has its own dataparser. metric: A term that describes what the data value for a time series actually represents. For instance, the ping values stored by the NNTSC smokeping collection and the rtt values stored by the AMP ICMP collection both measure latency, so the metric for these is 'latency'. A traceroute path length is measured in hops, so the metric in this case is 'hops'. module: Refers to the input source for a collection. The name 'module' is used because many collections share the same basic input source, e.g. all the AMP test collections deal with the AMP message broker, so the dataparsers for those collections all end up using the same Python module for receiving and processing messages from the shared source. modsubtype: Used to differentiate between collections that share the same input source. Short for 'module subtype', where the subtype can be used to determine which dataparser should be used to process and store the received measurements correctly. resolution: When discussing RRDs, resolution refers to the frequency at which measurements are taken. NNTSC only stores measurements taken at the highest available resolution and will not store RRD data that has been aggregated. stream: A stream is a single network measurement time series within NNTSC. Each stream has a unique ID number and has a series of properties that describe how the measurements were taken. Configuration ============= Before you start using NNTSC, you'll need to create a config file that defines where your time series data is going to be coming from and where you want to store it. An example configuration file is included in conf/nntsc.conf which you can use as a starting point. If you've installed NNTSC from a Debian package, this same file will be installed to /etc/nntsc/nntsc.conf and you should edit that file to add your configuration. The configuration options use the standard python ConfigParser format, where options are grouped into sections. The sections currently supported by NNTSC and their options are as follows: [database] Options relating to the Postgresql database where the relational data will be stored. Available options: host - the host where the database is located. Leave blank if the database is on the local machine. database - the name of the NNTSC database. username - the username to use when connecting to the database. Leave blank to use the name of the user running NNTSC. password - the password to use when connecting to the database. Leave blank if no password required. [influx] Options relating to the Influx database where the time series data will (mostly) be stored. Available options: useinflux - if 'no', all time series data will be stored in the Postgresql database instead. database - the name of the Influx database. username - the username to use when connecting to the database. Leave blank to use the name of the user running NNTSC. password - the password to use when connecting to the database. Leave blank if no password required. host - the host where the database is located. Leave blank if the database is on the local machine. port - the port that the Influx daemon is listening on. Use 8086 unless you've configured Influx to use another port. keepdata - the length of time to keep data in the database. For example, '30d' will keep data for 30 days. [liveexport] Options relating to the Rabbit queue that funnels newly collected data to the live exporter. Available options: queueid - the name of the message queue. port - the port that the local Rabbit instance is listening on. username - the username to use when connecting to Rabbit. password - the password to use when connecting to Rabbit. [modules] These options are used to enable or disable NNTSC dataparsers. To enable a module, set the appropriate option to "yes". To disable a module, set the option to "no". Available options: amp - controls whether NNTSC accepts data from AMP. rrd - controls whether NNTSC should scrape data from RRDs. [amp] Options relating to connecting to an AMP message queue and receiving AMP data Available options: username - the username to use when connecting to AMP's message queue. password - the password to use when connecting to AMP's message queue. host - the host where the AMP message broker is running. port - the port to connect to on the AMP message broker host. ssl - if True, encrypt all messages using SSL. If False, use plain-text. queue - the name of the message queue to read AMP data from. commitfreq - the number of AMP messages to process before committing. [rrd] Options relating to scraping data from existing RRD files. See "RRD Scraping" below for more details on how to configure RRD scraping. Available options: rrdlist - the full path to a file describing the RRDs file to scrape. RRD Scraping ============ NNTSC includes the ability to harvest measurements from the RRD files created by other measurement tools. This allows users to deploy NNTSC alongside their existing monitoring setup. At present, NNTSC only supports scraping the RRD files produced by Smokeping. In the Smokeping case, the Smokeping tool runs as per usual, writing its results to an RRD file. NNTSC will then read the most recent measurements from the RRD file and store them in its own database for later analysis. Unlike the RRD, though, NNTSC will keep the measurements in their original format for as long as they remain in the database, rather than aggregating them once they reach a certain age. The RRD list file is used to tell NNTSC which RRD files it should read and what is being measured by each file. A documented example RRD list file is provided in conf/rrd.examples which should be sufficient to help you write your own RRD list file. When you run the script to build your NNTSC database, the RRD list file will be parsed and appropriate streams will be created for each RRD that is described in the file. Once NNTSC itself is running, the specified RRDs will be periodically checked for new datapoints. Should you decide you add new RRDs to your RRD list, you must re-run the database build script for NNTSC to recognise them. Building the Database ===================== Before you can run NNTSC itself, you will need to create and initialise the database that will be used to store the time series data that you are collecting. NNTSC uses a postgresql database for its storage. First, decide on a name for your database -- in the following examples, I am going to use 'nntsc' as the database name but you can choose anything you like. Next, create a Postgresql database and ensure that it is owned by the user that you will be ultimately connecting to the database as. In the simplest case, I will be connecting using my own username so I can create the nntsc database using: createdb nntsc If you're using Influx to store the time series data, you'll also need to create an Influx database. Easiest way to do this via the InfluxQL shell. influx -execute 'CREATE DATABASE nntsc' The next step is run the build_nntsc_db script that should have been installed on your system when you installed NNTSC. This script will create all of the required tables for storing the time series data. In the case of some input sources, e.g. RRDs, it will also create 'streams' for each of time series that will be collected. Make sure the database section of your config file now refers to your new database and then simply run the script as follows: build_nntsc_db -C <your config file> If you update your configuration, e.g. you wish to enable a data source that you had earlier disabled or you have added new RRDs that you want to collect data for, you can re-run build_nntsc_db and it will add new tables and streams as needed without overwriting any existing ones. If you really do want to start over fresh, you can pass a -F flag to build_nntsc_db. Be warned, this will drop ALL of your existing tables and any time series data and streams that are stored within them. Don't do this unless you are really sure you want to lose all that data! If you explore your new Postgresql database with the psql tool, you should have a 'collections' table and a number of tables begining with 'stream_' and 'data_', one per collection. Starting NNTSC ============================ Once you've created and initialised your database, you can start NNTSC in several ways. Firstly, if you have installed NNTSC via a Debian package, there will be an init script installed in /etc/init.d/ that you can use to start and stop a NNTSC daemon. This is by far the best available option. Using the script, you can start NNTSC by running the following: /etc/init.d/nntsc start The init script also accepts 'stop' and 'restart' as commands, so it can be used to easily stop a running instance of NNTSC. The init script will use /etc/nntsc/nntsc.conf as the NNTSC configuration file so either make sure that file contains your desired config or edit the init script to point to your own config file. Alternatively, you may manually start an NNTSC daemon by running the following: nntsc -b -C <your config file> [-p port] [-P pidfile] Finally, you may run NNTSC in the foreground by omitting the '-b' option from the manual command given above. In this case, we strongly recommend running NNTSC within screen or some other persistant terminal environment. By default, the nntsc exporter will listen on port 61234 for incoming client connections. This can be changed using the '-p' option. Note that in all cases, the same config file must be used by both NNTSC and the build_nntsc_db script. If run as a daemon, NNTSC will write any error or log messages to /tmp/nntsc.log, otherwise it will report them directly to standard error. Querying the NNTSC Database =========================== While running, nntsc also listens on TCP port 61234 for client connections. Connected clients may query NNTSC to access any data currently stored in the time series database or register an interest in live data as it arrives from the various input sources. There is a Python API for writing a client that can connect to and query NNTSC, without the programmer needing to know the underlying communication protocol. The code for this API is located in the libnntscclient repository -- it's not very well-documented at the moment but hopefully we can improve that in future releases. To use the client API, just add the following imports to your Python program: from libnntscclient.protocol import * from libnntscclient.nntscclient import NNTSCClient There are three types of messages that a client can send to the NNTSC collector: NNTSC_REQUEST -- requests information about the current collections or streams NNTSC_SUBSCRIBE -- requests measurement data for a set of streams NNTSC_AGGREGATE -- requests aggregated measurement data for a set of streams NNTSC_REQUEST messages must also specify a request type: NNTSC_REQ_COLLECTION -- request a list of all available collections NNTSC_REQ_STREAMS -- request a list of streams for a given collection NNTSC_REQ_SCHEMA -- request the list of columns in the stream and data tables for a given collection NNTSC_SUBSCRIBE messages can be used to get all of the available measurements for some set of streams over a given time period. If the time period includes historical data, any available historical data will be immediately returned in response. If the time period ends in the future, new measurements will be exported to the client as they arrive until the end time is passed. If the end time is set to 0, then live data will be exported until the client disconnects. NNTSC_AGGREGATE messages work much the same as NNTSC_SUBSCRIBE messages, except that the measurements are aggregated into bins of a given size. NNTSC_AGGREGATE can only be used with historical data -- if the end time is in the future, then you will only get data up to the current time. You must also provide an aggregator function and a list of columns from the data table that you want to apply the aggregation to. Supported aggregation functions are: max, min, sum, avg and count. They should be fairly self-explanatory as to how they will aggregate the data within a bin.
About
Collects, stores and examines network time series data
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published