Getting Started

REST API Structure

NOTE: The structure of the REST API may be subject to changes! Please consult the CHANGELOG for relevant information.

There are two main components in this API:

First we have the management and deployment/provisioning component called Overlord (Monitoring Management API).
It is responsible for deployment and management of the Monitoring Core components: ElasticSearch, Logstash Server and Kibana.
Besides it is also responsible for the auxiliary component management and deployment. These include: Collectd, Logstash-forwarder.
Second, we have the interface used by other applications to query the DataWarehouse represented by ElasticSearch. This component is called Observer.
It is responsible for returning the monitoring metrics in various formats (CSV, JSON, simple output).

NOTE: Future versions will include authentication for the Overlord resources. However, in the DICE context this is not a requirement.

Overlord (Monitoring Management API)

The Overlord is composed from two major components:

Monitoring Core represented by: ElasticSearch, LogstashServer and Kibana
Monitoring Auxiliary represented by: Collectd, Logstash-Forwarder

Monitoring Core

GET /v1/log

Return the log of dmon. It contains information about the last requests and the IPs from which they originated as well as the status information from various sub components.

The D-Mon internal logging system lists 3 types of messages. INFO messages represent debug level information, WARNING is for handled exceptions and finally ERROR for caught errors.

GET /v1/overlord

Returns information regarding the current version of the Monitoring Platform.

GET /v1/overlord/framework

Returns the currently supported frameworks.

{
	"Supported Frameworks":["list_of_frameoworks"]
}

GET /v1/overlord/framework/{fwork}

Returns the metrics configuration file for big data technologies. The response will have the file mime-type encoded. For HDFS,Yarn and Spark it is set to 'text/x-java-properties' while for Storm it is 'text/yaml'.

PUT /v1/overlord/application/{appID}

Registers an application with D-Mon and creates a unique tag for the monitored data. The tag is defined by appID. Each appID will be added as a tag to all performance metrics for the data intensive application it identifies.

POST /v1/overlord/core

Deploys all monitoring core components provided that they have values for the preset hosts. If not it deploys all components locally with default settings.

NOTE: Currently the '-l' flag of the start script dmon-start.sh does the same as the later option.

GET /v1/overlord/core/database

Return the current internal state of D-MON in the form of an sqlite2 database. The response has application/x-sqlite3 mimetype.

PUT /v1/overlord/core/database

Can submit a new version of the internal database to D-MON by replacing the current states with new ones. The old states are backed up before applying the changes. The database should be formatted as a sqlite3 database file and sent unsing the application/x-sqlite3 mimetype.

GET /v1/overlord/core/status

Returns the current status of the Monitoring platform.

{
    "ElasticSearch":{
      "Status":"<HTTP_CODE>",
      "Name":"<NAME>",
      "ClusterName":"<CLUSTER_NAME>",
      "version":{
        "number":"<ES_VERSION>",
        "BuildHash":"<HASH>",
        "BuildTimestamp":"<TIMESTAMP>",
        "BuildSnapshot":"<BOOL>",
        "LuceneVersion":"<LC_VERSION>"
      		}
      },
    "Logstash":{
      "Status":"<HTTP_CODE>",
      "Version":"<VNUMBER>"
      },
    "Kibana":{
      "Status":"<HTTP_CODE>",
      "Version":"<VNUMBER>"
    }
}

NOTE: Only works for local deployments. It returns the current state of local ElasticSearch, Logstash server and Kibana status information.

GET /v1/overlord/detect/storm

Returns information regarding the currently registered Storm cluster. It will also list all available topologies.

POST /v1/overlord/detect/storm

Tries to detect if the current registered nodes have a valid storm deployment. It will first test if there are any nodes that have a Storm endpoint and port set. If this step fails it starts to scan all registered nodes. In case it finds the endpoint, the first topology is set in order to be monitored. Then, it sets all configurations necessary for monitoring storm automatically.

GET /v1/overlord/storm/logs

Returns the currently available storm logs

{
  "StormLogs": [
    "workerlogs_2016-12-07-15:05:10.tar",
    "workerlogs_2016-12-07-15:25:12.tar",
    "workerlogs_2016-12-07-16:00:09.tar",
    "workerlogs_2016-12-07-16:04:00.tar",
    "workerlogs_2016-12-07-16:08:01.tar",
    "workerlogs_2016-12-07-16:50:04.tar",
    "workerlogs_2016-12-08-16:22:18.tar",
    "workerlogs_2016-12-08-16:28:51.tar",
    "workerlogs_2016-12-08-17:29:27.tar",
    "workerlogs_2016-12-08-17:30:48.tar",
    "workerlogs_2016-12-08-17:32:53.tar",
    "workerlogs_2016-12-08-17:35:11.tar"
  ]
}

There are only worker logs. Each logs file listed in the response can be considered as a container which contains all worker logs from all monitored Storm logs.

POST /v1/overlord/storm/logs

Starts a background process which will getch worker logs from all registered Storm nodes. This can be a long running process as each worker produces about 100MB per log and there can be more than one worker instance per node.

GET /v1/overlord/storm/logs/active

Checks if there are any running log fetch processes.

GET /v1/overlord/storm/logs/{workerlogs}

Returns the worker log files specified by workerlogs.

GET /v1/overlord/detect/yarn

Returns the currently registered YARN job history servers.

POST /v1/overlord/detect/yarn

Attempts to detect a YARN history server instance from the pool of monitored nodes.

PUT /v1/overlord/detect/yarn

Used to define the endpoint for a known YARN history server.

{
  "NodeIP": "<node_IP>",
  "NodePort": "<port>",
  "Polling": 30
}

Note: Logstash server instance has to be restarted in order to successfully collect yarn history server related metrics.

GET /v1/overlord/history/yarn

Return all of the jobs that have ever run on the registered YARN deployment.

GET /v1/overlord/history/yarn/jobs

Returns a list of YARN jobs.

GET /v1/overlord/history/yarn/jobs/tasks

Returns all jobs and their associated tasks.

GET /v1/overlord/mongo

Returns the current registered MongoDB instance information.

{
  "MongoDBs": "admin",
  "MongoHost": "127.0.0.1",
  "MongoPort": "27017",
  "Password": true,
  "User": true
}

PUT /v1/overlord/mongo

Registers a MongoDB instance to be monitored.

{
  "MongoDBs": "<DB_Name>",
  "MongoHost": "<IP>",
  "MongoPort": "<port>",
  "Password": "<password>",
  "User": "<user_name"
}

NOTE: The registration process will only take effect after the Logstash server instance has been restarted!

GET /v1/overlord/nodes

Returns the current monitored nodes list.

{
    "Nodes":[
      {"<NodeFQDN1>":"NodeIP1"},
      {"<NodeFQDN2>":"NodeIP2"},
      {"<NodeFQDNn>":"NodeIPn"}
      ]
  }

PUT /v1/overlord/nodes

Includes the given nodes into the monitored node pools. In essence nodes are represented as a list of dictionaries. Thus, it is possible to register one to many nodes at the same time. It is possible to assign different user names and passwords to each node.

Input:

{
    "Nodes":[
      
        {
          "NodeName":"<NodeFQDN1>",
          "NodeIP":"<IP>",
          "key":"<keyName|null>",
          "username":"<uname|null>",
          "password":"<pass|null>"
      },
        {
          "NodeName":"<NodeFQDNn>",
          "NodeIP":"<IP>",
          "key":"<keyName|null>",
          "username":"<uname|null>",
          "password":"<pass|null>"
        }
    ]
}

NOTE: Only username and key authentication is currently supported. There is a facility to use public/private key authentication which is currently undergoing testing.

POST /v1/overlord/nodes

Bootstrap of all non monitored nodes. Installs, configures and starts collectd and logstash-forwarder on them. This feature is not recommended for testing, the usage of separate commands is preferred in order to detect network failures.

NOTE: Define one json to completely populate and set up dmon-controller. It can be then used to save and share internal state by sending the json between controller instances.

GET /v1/overlord/nodes/roles

Returns the roles currently held by each computational node.

{
  "Nodes": [
    {
      "dice.cdh5.mng.internal": [
        "storm",
        "spark"
      ]
    },
    {
      "dice.cdh5.w1.internal": [
        "unknown"
      ]
    },
    {
      "dice.cdh5.w2.internal": [
        "yarn",
        "spark",
        "storm"
      ]
    },
    {
      "dice.cdh5.w3.internal": [
        "unknown"
      ]
    }
  ]
}

If the node has an unknown service installed, or the roles are not specified the type is set to unknown.

PUT /v1/overlord/nodes/roles

Modifies the roles of each nodes.

Input:

{
  "Nodes": [
    {
      "NodeName": "<nodeFQDN>",
      "Roles": [
        "yarn"
      ]
    }

POST /v1/overlord/nodes/roles

Generates metrics configuration files for each role assigned to a node and uploads them to the required directory. It returns a list of all nodes to which a configuration of a certain type (i.e. yarn, spark, storm etc) has been uploaded.

{
	"Status":{
		"yarn":["list_of_yarn_nodes"],
		"spark":["list_of_spark_nodes"],
		"storm":["list_of_storm_nodes"],
		"unknown":["list_of_unknown_nodes"]
		}
}

NOTE: The directory structure is based on the Vanilla and Cloudera distribution of HDFS, Yarn and Spark. Custom installations are not yet supported. As yarn and HDFS have the same metrics system, their tags (i.e. hdfs and yarn) are interchangeable in the context of D-Mon.

GET /v1/overlord/nodes/{nodeFQDN}

Returns information of a particular monitored node identified by nodeFQDN.

Response:

{
      "NodeName":"nodeFQDN",
      "Status":"<online|offline>",
      "IP":"<NodeIP>",
      "OS":"<Operating_Systen>",
      "key":"<keyName|null>",
      "username":"<uname|null>",
      "password":"<pass|null>",
      "Roles":"[listofroles]"
}

FUTURE Version: A more fine grained node status will be implemented. Currently it is boolean - online/offline. The last three elements are not implemented. These are scheduled for future versions.

PUT /v1/overlord/nodes/{NodeFQDN}

Changes the current information of a given node. Node FQDN may not change from one version to another.

Input:

{
  "NodeName":"<nodeFQDN>",
  "IP":"<NodeIP>",
  "OS":"<Operating_Systen>",
  "Key":"<keyName|null>",
  "Username":"<uname|null>",
  "Password":"<pass|null>",
  "LogstashInstance": "<ip_logstash>"
}

POST /v1/overlord/nodes/{NodeFQDN}

Bootstraps specified node.

NOTE: Possible duplication with ../aux/.. branch. DEPRECATED.

DELETE /v1/overlord/nodes/{nodeFQDN}

Stops all auxiliary monitoring components associated with a particular node.

NOTE: This does not delete nodes or configurations; it only stops collectd and logstash-forwarder on the selected nodes. DEPRECATED.

PUT /v1/overlord/nodes/{nodeFQDN}/roles

Defines the roles each node has inside the cluster.

Input:

{
	"Roles":"[list_of_roles]"
}

POST /v1/overlord/nodes/{nodeFQDN}/roles

Redeploys metrics configuration for a specific node based on the roles assigned to it.

FUTURE WORK: This feature will be developed for future versions.

DELETE /v1/overlord/nodes/{nodeFQDN}/purge

This resource deletes auxiliary tools from a given node and also removes all setting from D-Mon. This process is irreversible.

GET /v1/overlord/core/es

Return a list of current hosts comprising the ES cluster core components. The first registered host is set as the default master node. All subsequent nodes are set as workers. If status is detached then core es instance is running as a daemon.

{
  "ES Instances": [
    {
      "DataNode": true,
      "ESClusterName": "diceMonit",
      "ESCoreDebug": "0",
      "ESCoreHeap": "3g",
      "FieldDataCacheExpire": "6h",
      "FieldDataCacheFilterExpires": "6h",
      "FieldDataCacheFilterSize": "20%",
      "FieldDataCacheSize": "20%",
      "HostFQDN": "dice.cdh5.dmon.internal",
      "IP": "127.0.0.1",
      "IndexBufferSize": "30%",
      "MasterNode": true,
      "MinIndexBufferSize": "96mb",
      "MinShardIndexBufferSize": "12mb",
      "NodeName": "esCoreMaster",
      "NodePort": 9200,
      "NumOfReplicas": 1,
      "NumOfShards": 5,
      "OS": "ubuntu",
      "PID": 2531,
      "Status": "Running"
    }
  ]
}

POST /v1/overlord/core/es

Generates and applies the new configuration options for the ES Core components. During this request the new configuration will be generated.

NOTE: This version of the resource is deprecated use /v2 version.

POST /v2/overlord/core/es

Generates and applies the new configuration options for the ES Core components. During this request the new configuration will be generated. Uses init script for startup increasing performance and reliability. Can be detached from dmon-controller instance.

NOTE: If the configuration is unchanged ES Core will not be restarted! It is possible to deploy the monitoring platform on different hosts than elasticsearch only in case that the FQDN or IP is provided.

FUTURE Work: This process needs more streamlining. It is recommended to use only local deployments for this version.

GET /v1/overlord/core/es/config

Returns the current configuration file for ElasticSearch in the form of a YAML file.

NOTE: The first registered ElasticSearch information will be set by default to be the master node.

PUT /v1/overlord/core/es/config

Changes the current configuration options for the Elasticsearch instance defined by it's FQDN and IP.

Input:

{
  "DataNode": true,
  "ESClusterName": "string",
  "ESCoreDebug": 1,
  "ESCoreHeap": "4g",
  "FieldDataCacheExpires": "6h",
  "FieldDataCacheFilterExpires": "6h",
  "FieldDataCacheFilterSize": "20%",
  "FieldDataCacheSize": "20%",
  "HostFQDN": "string",
  "IP": "string",
  "IndexBufferSize": "30%",
  "MasterNode": true,
  "MinIndexBufferSize": "96mb",
  "MinShardIndexBufferSize": "12mb",
  "NodeName": "string",
  "NodePort": 9200,
  "NumOfReplicas": 0,
  "NumOfShards": 1,
  "OS": "unknown"
}

NOTE: The new configuration will not be generated at this step. Currently only ESClusterName, HostFQDN, IP, NodeName, NodePort are required. This will be changed in future versions.

GET /v1/overlord/core/es/status/<intComp>/property/<intProp>

Returns diagnostic data about the master elasticsearch instance.

DELETE /v1/overlord/core/es/<hostFQDN>

Stops the ElasticSearch (es) instance on a given host and removes all configuration data from DMON.

GET /v1/overlord/core/es/cluster/health

Return the current state of the cluster.

GET /v1/overlord/core/es/cluster/settings

Return the current ES core service settings. Some of these can be set during runtime while other only during startup.

GET /v1/overlord/core/es/cluster/state

Returns the current state of all replicas, indices and shards.

GET /v1/overlord/core/es/cluster/health

Returns the health of the current D-Mon cluster.

GET /v1/overlord/core/es/node/master/info

Returns information about the current master node of the D-Mon cluster.

GET /v1/overlord/core/es/node/master/state

Return the current state of the D-Mon cluster master node.

POST /v1/overlord/core/es/<hostFQDN>/start

Start the es instance on the host identified by hostFQDN. It uses the last good generated es configuration.

POST /v1/overlord/core/es/<hostFQDN>/stop

Stops the es instance on the host identified by hostFQDN.

POST /v1/overlord/core/halt

Stops all core components on every node.

GET /v1/overlord/core/es/index/{indexName}

Returns the current status of the desired index identified by indexName.

GET /v1/overlord/core/es/<hostFQDN>/status

Returns the current status (Running, Stopped, Unknown) and PID of the es instance on the host identified by hostFQDN.

GET /v1/overlord/core/ls

Returns the current status of all logstash server instances registered with D-Mon. If status is detached then core ls instance is running as a daemon.

Response:

{
	"LS Instances":[
	  {
	  	"ESClusterName": "diceMonit",
      "HostFQDN": "dice.cdh5.dmon.internal",
      "IP": "109.231.121.210",
      "LPort": 5000,
      "LSCoreHeap": "512m",
      "LSCoreSparkEndpoint": "None",
      "LSCoreSparkPort": "None",
      "LSCoreStormEndpoint": "None",
      "LSCoreStormPort": "None",
      "LSCoreStormTopology": "None",
      "OS": "ubuntu",
      "Status": "Running",
      "udpPort": 25680
	  	  
	  }
	]
}

POST /v1/overlord/core/ls

Starts the logstash server based on the configuration information. During this step the configuration file is first generated.

NOTE: This resource is deprecated use /v2 instead

POST /v2/overlord/core/ls

Starts the logstash server based on the configuration information. During this step the configuration file is first generated. Uses init script for startup. Enabled watch dog support in order to increase performance and reliability.

FUTURE Work: Better support for distributed deployment of logstash core service instances.

DELETE /v1/overlord/core/ls/<hostFQDN>

Stops the logstash server instance on a given host and removes all configuration data from DMON.

GET /v1/overlord/core/ls/config

Returns the current configuration file of Logstash Server.

PUT /v1/overlord/ls/config

Changes the current configuration of Logstash Server.

Input:

{
  "ESClusterName": "diceMonit",
  "HostFQDN": "string",
  "IP": "string",
  "Index": "logstash",
  "LPort": 5000,
  "LSCoreHeap": "512m",
  "LSCoreSparkEndpoint": "None",
  "LSCoreSparkPort": "None",
  "LSCoreStormEndpoint": "None",
  "LSCoreStormPort": "None",
  "LSCoreStormTopology": "None",
  "LSCoreWorkers": "4",
  "OS": "string",
  "udpPort": 25826
}

NOTE: LS instances are bound by their FQDN this means that it can't change. Future Work: Only for local deployment of logstash server core service.

GET /v1/overlord/core/ls/<hostFQDN>/status

Return the status of the logstash server running on the host identified by hostFQDN.

POST /v1/overlord/core/ls/<hostFQDN>/start

Start the logstash server instance on the host identified by hostFQDN. It will use the last good configuration.

POST /v1/overlord/core/ls/<hostFQDN>/stop

Stops the logstash server instance on the host identified by hostFQDN.

GET /v1/overlord/core/ls/credentials

Returns the current credentials for logstash server core service.

Response:

{
  "Credentials": [
  	{
  		"Certificate":"<certificate name>",
  		"Key":"<key name>",
  		"LS Host":"<host fqdn>"
  	}
  ]
}

NOTE: Logstash server and the logstash forwarder need a private/public key in order to establish secure communications. During local deployment ('-l' flag) a default public private key-pair is created.

GET /v1/overlord/core/ls/cert/{certName}

Returns the hosts using a specified certificate. The certificate is identified by its certName.

Response:

{
	"Host":"[listofhosts]",
}

Note: By default all Nodes use the default certificate created during D-Mon initialization. This request returns a list of hosts using the specified certificate.

PUT /v1/overlord/core/ls/cert/{certName}/{hostFQDN}

Uploads a certificate with the name given by certName and associates it with the given host identified by hostFQDN.

NOTE: The submitted certificate must use the application/x-pem-file Content-Type.

GET /v1/overlord/core/ls/key/{keyName}

Retruns the host associated with the given key identified by keyName parameter.

Response:

{
	"Host":"<LS host name>",
	"Key":"<key name>"
}

PUT /v1/overlord/core/ls/key/{keyName}/{hostFQDN}

Uploads a private key with the name given by keyName and associates it with the given host identified by hostFQDN.

NOTE: The submitted private key must use the application/x-pem-file Content-Type.

GET /v1/overlord/core/kb

Returns information for all kibana instances.

{
	"KB Instances":[{
		"HostFQDN":"<FQDN>",
		"IP":"<host_ip>",
		"OS":"<os_type>",
		"KBPort":"<kibana_port>",
		"PID":"<kibana_pid>",
		"KBStatus":"<Running|Stopped|Unknown>"
	}
	]
}

POST /v1/overlord/core/kb

Generates the configuration file and Starts or Restarts a kibana session.

NOTE: Currently supports only one instance. No distributed deployment.

GET /v1/overlord/core/kb/config

Returns the current configuration file for Kibana. Uses the mime-type 'text/yaml'.

PUT /v1/overlord/core/kb/config Changes the current configuration for Kibana

Input:

{
	"HostFQDN":"<FQDN>",
	"IP":"<host_ip>",
	"OS":"<os_type>",
	"KBPort":"<kibana_port>"
}

GET /v1/overlord/core/kb/visualizations

Returns the current Kibana core service visualizations registered in DMON.

POST /v1/overlord/core/kb/visualizations

This will generate default visualizations and registers them inside DMON. Visualisation generations are based on each monitored node roles.

Monitoring auxiliary

GET /v1/overlord/aux

Returns basic information about auxiliary components.

FUTURE Work: Information will basically be a kind of Readme.

GET /v1/overlord/aux/agent

Returns the current deployment status of dmon-agents.

{
  "Agents": [
    {
      "Agent": false,
      "NodeFQDN": "dice.cdh5.mng.internal"
    },
    {
      "Agent": false,
      "NodeFQDN": "dice.cdh5.w1.internal"
    },
    {
      "Agent": false,
      "NodeFQDN": "dice.cdh5.w2.internal"
    },
    {
      "Agent": false,
      "NodeFQDN": "dice.cdh5.w3.internal"
    }
  ]
}

POST /v1/overlord/aux/agent

Bootstraps the installation of dmon-agent services on nodes that are note marked as already active. It only works if all nodes have the same authentication.

GET /v1/overlord/aux/deploy

Returns monitoring component status of all nodes. Similar to v2 of this resource.

	{
		"NodeFQDN":"<nodeFQDN>",
		"NodeIP":"<nodeIP>",
		"Monitored":"<boolean>",
		"Collectd":"<status>",
		"LSF":"<status>",
		"LSInstance": "<ip_logstash>"
	}

NOTE: Marked as DEPRECATED. Will be deleted in future versions.

POST /v1/overlord/aux/deploy

Deploys all auxiliary monitoring components on registered nodes and configures them.

NOTE: There are three statuses associated with each auxiliary component.

None -> There is no aux component on the registered node
Running -> There is the aux component on the registered node and it is currently running
Stopped -> There is the aux component on the registered node and it is currently stopped

If the status is None than this resource will install and configure the monitoring components. However if the status is Running nothing will be done. The services with status Stopped will be restarted.

All nodes can be restarted independent from their current state using the --redeploy-all parameter.

NOTE: Marked as DEPRECATED. Will be deleted in future versions. Use v2 version of the same resource for parallel implementation of this resource.

POST /v1/overlord/aux/deploy/{collectd|logstashfw}/{NodeName}

Deploys either collectd or logstash-forwarder to the specified node. In order to reload the configuration file the --redeploy parameter has to be set. If the current node status is None then the defined component (collectd or lsf) will be installed.

FUTURE Work: Currently configurations of both collectd and logstash-forwarder are global and can't be changed on a node by node basis.

GET /v1/overlord/aux/interval

Returns the current polling time interval for all tools. This is a global setting. Future versions may be implemented for a node by node interval setting.

Output:

{
	"Spark": "5",
	"Storm": "60",
	"System": "15",
	"YARN": "15"
}

PUT /v1/overlord/aux/interval

Changes the settings for all monitored systems.

Input:

{
	"Spark": "5",
	"Storm": "60",
	"System": "15",
	"YARN": "15"
}

GET /v1/overlord/aux/{collectd|logstashfw}/config

Returns the current collectd or logstashforwarder configuration file.

PUT /v1/overlord/aux/{collectd|logstashfw}/config

Changes the configuration/status of collectd or logstashforwarder and restarts all auxiliary components.

POST /v1/overlord/aux/{auxComp}/start

Starts the specified auxiliary component on all nodes.

NOTE: This resource is DEPRECATED. Use v2 instead.

POST /v1/overlord/aux/{auxComp}/stop

Stops the specified auxiliary components on all nodes.

NOTE: This resource is DEPRECATED. Use v2 instead.

POST /v1/overlord/aux/{auxComp}/{nodeFQDN}/start

Starts the specified auxiliary component on a specific node.

NOTE: This resource is DEPRECATED. Use v2 instead.

POST /v1/overlord/aux/{auxComp}/{nodeFQDN}/stop

Stops the specified auxiliary component on a specific node.

NOTE: This resource is DEPRECATED. Use v2 instead.

Note: Some resources have been redesigned with parallel processing in mind. These use greenlets (gevent) to parallelize as much as possible the first version of the resources. These paralel resources are marked with ../v2/... All other functionality and return functions are the same.

For the sake of brevity these resources will not be detailed. Only additional functionality will be documented.

POST /v2/overlord/aux/deploy

Sets up the dmon-agent based on what roles are registerd for each nodes.

POST /v2/overlord/aux/{auxComp}/{nodeFQDN}/configure

Configures dmon-agent auxComp on node nodeFQDN.

POST /v2/overlord/aux/{auxComp}/configure

Configures dmon-agent auxComp on all nodes.

GET /v2/overlord/aux/deploy/check

Polls dmon-agents from the current monitored cluster.

{
  "Failed": [],
  "Message": "Nodes updated!",
  "Status": "Update"
}

If nodes don't respond they are added to the Failed list together with the appropriate HTTP error code.

GET /v2/overlord/aux/status

Returns the current status of all nodes and auxiliary components:

Outputs:

{
  "Aux Status": [
    {
      "Collectd": "Running",
      "LSF": "Running",
      "Monitored": true,
      "NodeFQDN": "dice.cdh5.mng.internal",
      "NodeIP": "109.231.121.135",
      "Status": true
    },
    {
      "Collectd": "Running",
      "LSF": "Running",
      "Monitored": true,
      "NodeFQDN": "dice.cdh5.w1.internal",
      "NodeIP": "109.231.121.194",
      "Status": true
    },
    {
      "Collectd": "Running",
      "LSF": "Running",
      "Monitored": true,
      "NodeFQDN": "dice.cdh5.w2.internal",
      "NodeIP": "109.231.121.134",
      "Status": true
    },
    {
      "Collectd": "Running",
      "LSF": "Running",
      "Monitored": true,
      "NodeFQDN": "dice.cdh5.w3.internal",
      "NodeIP": "109.231.121.156",
      "Status": true
    }
  ]
}

POST /v2/overlord/aux/{auxComp}/start

Start auxComp on all nodes using parallel calls to the dmon-agent.

POST /v2/overlord/aux/{auxComp}/stop

Stops auxComp on all nodes using parallel calls to the dmon-agent.

POST /v2/overlord/aux/{auxComp}/{nodeFQDN}/start

Start auxComp on node nodeFQDN using parallel calls to the dmon-agent.

POST /v2/overlord/aux/{auxComp}/{nodeFQDN}/stop

Stops auxComp on node nodeFQDN using parallel calls to the dmon-agent.

Observer

GET /v1/overlord/application

This returns all of the application versions currently registered with D-Mon.

{
  "appID-1": {
    "start": "2016-11-23 11:37:02.800271",
    "status": "STOPPED",
    "stop": "2016-12-14 09:29:05.523252",
    "ver": "3"
  },
  "appID-2": {
    "start": "2016-12-14 09:29:05.523252",
    "status": "ACTIVE",
    "stop": null,
    "ver": "1"
  }
}

NOTE: Each application ID can have several versions.

GET /v1/observer/applications/{appID}

Returns information on a particular YARN application identified by {appID}. The information will not contain monitoring data only a general overview. Similar to YARN History Server.

NOTE: Scheduled for future release. After M18

GET /v1/observer/nodes

Returns the current monitored nodes list. Listing is only limited to node FQDN and current node IP.

NOTE: Some cloud providers assign the IP dynamically at VM startup. Because of this D-Mon treats the FQDN as a form of UUID. In future versions this might change, the FQDN being replaced/augmented with a hash.

Response:

{
    "Nodes":[
      {"<NodeFQDN1>":"NodeIP1"},
      {"<NodeFQDN2>":"NodeIP2"},
      {"<NodeFQDNn>":"NodeIPn"}
      ]
  }

GET /v1/observer/nodes/{nodeFQDN}

Returns information of a particular monitored node. No information is limited to non confidential information, no authentication credentials are returned. The Status field is true if dmon-agent has been already deployed while Monitored is true if it is also started. LSInstance represents the to witch logstash instance the node is assigned to.

Response:

{
    "<nodeFQDN>":{
      "Status":"<boolean>",
      "IP":"<NodeIP>",
      "Monitored":"<boolean>",
      "OS":"<Operating_System>",
      "LSInstance": "<ip_of_logstash>"
    }
}

GET /v1/observer/nodes/{nodeFQDN}/roles

Returns roles of the node identified by 'nodeFQDN'.

Response:

{
	"Roles":["yarn","spark"]
}

NOTE: Roles are returned as a list. Some elements represent in fact more than one service, for example 'yarn' represents both 'HDFS' and 'Yarn'.

POST /v1/observer/query/{csv/json/plain}

Returns the required metrics in csv, json or plain format.

Input:

{
  "DMON":{
    "query":{
      "size":"<SIZEinINT>",
      "ordering":"<asc|desc>",
      "queryString":"<query>",
      "tstart":"<startDate>",
      "tstop":"<stopDate>"
    }
  }
}

Output depends on the option selected by the user: csv, json or plain.

NOTE: The filter metrics must be in the form of a list. Also, filtering currently works only for CSV and plain output. Future versions will include the ability to export metrics in the form of RDF+XML in concordance with the OSLC Performance Monitoring 2.0 standard. It is important to note that we will export in this format only system metrics. No Big Data framework specific metrics.

From version 0.1.3 it is possible to ommit the tstop parameter, instead it is possible to define a time window based on the current system time:

{
  "DMON":{
    "query":{
      "size":"<SIZEinINT>",
      "ordering":"<asc|desc>",
      "queryString":"<query>",
      "tstart":"now-30s"
    }
  }
}

where s stands for second or m for minites and h for hours.

From version 0.2.0 it is possible to specify custom index to be used in the query. The index definiton supports the * wildcard character.

{
  "DMON":{
    "query":{
      "size":"<SIZEinINT>",
      "index":"logstash-*",
      "ordering":"<asc|desc>",
      "queryString":"<query>",
      "tstart":"now-30s"
    }
  }
}

POST /v2/observer/query/{csv/json}

This resource has predefined queries for all supported technologies. The query payload is of the form:

{
  "DMON": {
    "aggregation": "system",
    "fname": "output",
    "index": "logstash-*",
    "interval": "10s",
    "size": 0,
    "tstart": "now-1d",
    "tstop": "now"
  }
}

The term aggregation can have the following values: system, yarn, spark, storm. It is based on dataframes and has a much better performance for large data queries. This should be used instead of /v1 query resource for large queries.

For query example for different technologies, please see Query Examples page.

DICE Monitoring Platform
DICE project page | Follow @diceh2020 on Twitter | Follow DiceH2020 on Facebook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Started

REST API Structure

Overlord (Monitoring Management API)

Monitoring Core

Monitoring auxiliary

Observer

Clone this wiki locally