Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Index Statistics metrics #10784

Merged
merged 39 commits into from
Dec 15, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
23bbee5
added couchbase 7 with index stats endpoint as a test environment
steveny91 Nov 30, 2021
6720269
Initial metric parsing for index stats
steveny91 Nov 30, 2021
74aa629
updated config model and spec to include the new parameter needed to …
steveny91 Nov 30, 2021
9f1b27a
styling fix
steveny91 Nov 30, 2021
cc1e841
more descriptive service check message
steveny91 Dec 1, 2021
6fc9918
integration test for index stats metric collection
steveny91 Dec 2, 2021
941b6c9
slight logic refactor
steveny91 Dec 2, 2021
9de3ec3
added units test and refactor metric parsing logic
steveny91 Dec 2, 2021
2b391b9
smaller sample bucket for testing
steveny91 Dec 2, 2021
7d6d1cd
add new metrics to metadata.csv
steveny91 Dec 2, 2021
125977e
add comments for documentations and touch ups on tests
steveny91 Dec 2, 2021
a16bb57
fix typo
steveny91 Dec 2, 2021
19dd63a
lower timeout on tests
steveny91 Dec 2, 2021
a44da4a
attempt at fixing flaky test
steveny91 Dec 2, 2021
9d4effc
fix metric descriptions and config param description
steveny91 Dec 2, 2021
45c47ec
style fix
steveny91 Dec 2, 2021
e15b07a
removal of log line and minor description change
steveny91 Dec 3, 2021
60c0343
incorporated changes from reviews
steveny91 Dec 3, 2021
bc0e452
switch unit test to use parameterize version
steveny91 Dec 3, 2021
f0c199a
change tests and small refactor to tag parsing logic
steveny91 Dec 3, 2021
a8b6d10
change few metrics to count metrics
steveny91 Dec 3, 2021
7bcbb36
styling
steveny91 Dec 3, 2021
c0c6fe7
add one unit test for tag extraction
steveny91 Dec 3, 2021
9e4ba42
styling
steveny91 Dec 3, 2021
5d296a3
add missing unit for metric
steveny91 Dec 6, 2021
b97b0dd
small nits and metadata edits
steveny91 Dec 6, 2021
9a66f16
styling
steveny91 Dec 6, 2021
11765d3
small tweak in test
steveny91 Dec 6, 2021
ec04882
warn log for index stats requirement not met
steveny91 Dec 7, 2021
3966704
typo in comments
steveny91 Dec 7, 2021
34f39de
grammaticaal fix and error handling for version
steveny91 Dec 8, 2021
b6931f0
add more error handling
steveny91 Dec 8, 2021
1afb2fd
remove unused import
steveny91 Dec 8, 2021
e1001b0
logic error fix
steveny91 Dec 8, 2021
25196a5
Better version check logic in test
steveny91 Dec 10, 2021
79c5ed7
small refactor
steveny91 Dec 14, 2021
69a5829
refactor
steveny91 Dec 14, 2021
edc24f1
pull remote changes
steveny91 Dec 14, 2021
48ff4d3
slight change to log message
steveny91 Dec 15, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions couchbase/assets/configuration/spec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,13 @@ files:
value:
example: http://localhost:4986
type: string
- name: index_stats_url
description: |
The URL to get Index Statistics, available since Couchbase 7.0.
See https://docs.couchbase.com/server/current/rest-api/rest-index-stats.html
value:
example: http://localhost:9102
type: string
- template: instances/http
- template: instances/default
- template: logs
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,10 @@ def instance_headers(field, value):
return get_default_field_value(field, value)


def instance_index_stats_url(field, value):
return 'http://localhost:9102'


def instance_kerberos_auth(field, value):
return 'disabled'

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ class Config:
empty_default_hostname: Optional[bool]
extra_headers: Optional[Mapping[str, Any]]
headers: Optional[Mapping[str, Any]]
index_stats_url: Optional[str]
kerberos_auth: Optional[str]
kerberos_cache: Optional[str]
kerberos_delegate: Optional[bool]
Expand Down
89 changes: 86 additions & 3 deletions couchbase/datadog_checks/couchbase/couchbase.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,17 @@

import requests
from six import string_types
from six.moves.urllib.parse import urljoin

from datadog_checks.base import AgentCheck, ConfigurationError
from datadog_checks.couchbase.couchbase_consts import (
BUCKET_STATS,
COUCHBASE_STATS_PATH,
COUCHBASE_VITALS_PATH,
INDEX_STATS_COUNT_METRICS,
INDEX_STATS_METRICS_PATH,
INDEX_STATS_SERVICE_CHECK_NAME,
INDEXER_STATE_MAP,
NODE_CLUSTER_SERVICE_CHECK_NAME,
NODE_HEALTH_SERVICE_CHECK_NAME,
NODE_HEALTH_TRANSLATION,
Expand Down Expand Up @@ -43,13 +48,15 @@ def __init__(self, name, init_config, instances):
super(Couchbase, self).__init__(name, init_config, instances)

self._sync_gateway_url = self.instance.get('sync_gateway_url', None)
self._index_stats_url = self.instance.get('index_stats_url')
self._server = self.instance.get('server', None)
if self._server is None:
raise ConfigurationError("The server must be specified")
self._tags = list(set(self.instance.get('tags', [])))
self._tags.append('instance:{}'.format(self._server))

self._previous_status = None
self._version = None

def _create_metrics(self, data):
# Get storage metrics
Expand Down Expand Up @@ -183,6 +190,12 @@ def check(self, _):
self._create_metrics(data)
if self._sync_gateway_url:
self._collect_sync_gateway_metrics()
try:
# Error handling in case Couchbase changes their versioning format
if self._index_stats_url and self._version and int(self._version.split(".")[0]) >= 7:
self._collect_index_stats_metrics()
except Exception as e:
self.log.debug(str(e))

def _collect_version(self, data):
nodes = data['stats']['nodes']
Expand All @@ -198,9 +211,9 @@ def _collect_version(self, data):
build_separator = version.rindex('-')
version = list(version)
version[build_separator] = '+'
version = ''.join(version)
self._version = ''.join(version)

self.set_metadata('version', version)
self.set_metadata('version', self._version)

def get_data(self):
# The dictionary to be returned.
Expand Down Expand Up @@ -318,7 +331,7 @@ def _collect_sync_gateway_metrics(self):
try:
data = self._get_stats(url).get('syncgateway', {})
except requests.exceptions.RequestException as e:
msg = "Error accessing the Sync Gateway monitoring endpoint %s: %s," % url, str(e)
msg = "Error accessing the Sync Gateway monitoring endpoint %s: %s," % (url, str(e))
self.log.debug(msg)
self.service_check(SG_SERVICE_CHECK_NAME, AgentCheck.CRITICAL, msg, self._tags)
return
Expand Down Expand Up @@ -402,3 +415,73 @@ def extract_seconds_value(self, value):
unit = 'us'

return float(val) / TO_SECONDS[unit]

def _collect_index_stats_metrics(self):
url = urljoin(self._index_stats_url, INDEX_STATS_METRICS_PATH)
try:
data = self._get_stats(url)
except requests.exceptions.RequestException as e:
msg = "Error accessing the Index Statistics endpoint: %s: %s" % (url, str(e))
self.log.warning(msg)
self.service_check(INDEX_STATS_SERVICE_CHECK_NAME, AgentCheck.CRITICAL, self._tags, msg)
return

self.service_check(INDEX_STATS_SERVICE_CHECK_NAME, AgentCheck.OK, self._tags)

for keyspace in data:
if keyspace == "indexer":
# The indexer object provides metric about the index node
for mname, mval in data.get(keyspace).items():
self._submit_index_node_metrics(mname, mval)
else:
index_tags = self._extract_index_tags(keyspace) + self._tags
for mname, mval in data.get(keyspace).items():
self._submit_per_index_metrics(mname, mval, index_tags)

def _extract_index_tags(self, keyspace):
# Index Keyspaces can come in different formats:
# partition, bucket:index_name, bucket:collection:index_name, bucket:scope:collection:index_name
# For variations missing the scope and collection, they refer to the default scope and collection respectively
# https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/createprimaryindex.html#keyspace-ref
tag_arr = keyspace.split(":")
if len(tag_arr) == 2:
bucket, index_name = tag_arr
scope, collection = ["default", "default"]
elif len(tag_arr) == 3:
bucket, collection, index_name = tag_arr
scope = 'default'
elif len(tag_arr) == 4:
bucket, scope, collection, index_name = tag_arr
else:
# Catch all incase the keyspace has either none or 3 or more separators(':')
# There is a documented example of partition-num being a possible keyspace:
# https://docs.couchbase.com/server/current/rest-api/rest-index-stats.html#_get_index_stats
# But we shouldn't encounter this since we don't query the index api with the needed params
# (Version 1.19.0+ of the Couchbase check)
formatted_tags = []
self.log.debug("Unable to extract tags from keyspace: %s", keyspace)
return formatted_tags

formatted_tags = [
'bucket:{}'.format(bucket),
'scope:{}'.format(scope),
'collection:{}'.format(collection),
'index_name:{}'.format(index_name),
]
return formatted_tags

def _submit_index_node_metrics(self, mname, mval):
namespace = 'couchbase.indexer'
f_mname = '.'.join([namespace, mname])
if mname == "indexer_state":
self.gauge(f_mname, INDEXER_STATE_MAP[mval], self._tags)
else:
self.gauge(f_mname, mval, self._tags)

def _submit_per_index_metrics(self, mname, mval, tags):
namespace = 'couchbase.index'
f_mname = '.'.join([namespace, mname])
if mname in INDEX_STATS_COUNT_METRICS:
self.monotonic_count(f_mname, mval, tags)
else:
self.gauge(f_mname, mval, tags)
17 changes: 17 additions & 0 deletions couchbase/datadog_checks/couchbase/couchbase_consts.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,14 @@
COUCHBASE_STATS_PATH = '/pools/default'
COUCHBASE_VITALS_PATH = '/admin/vitals'
SG_METRICS_PATH = '/_expvar'
INDEX_STATS_METRICS_PATH = '/api/v1/stats'

# Service Checks
SERVICE_CHECK_NAME = 'couchbase.can_connect'
SG_SERVICE_CHECK_NAME = 'couchbase.sync_gateway.can_connect'
NODE_CLUSTER_SERVICE_CHECK_NAME = 'couchbase.by_node.cluster_membership'
NODE_HEALTH_SERVICE_CHECK_NAME = 'couchbase.by_node.health'
INDEX_STATS_SERVICE_CHECK_NAME = 'couchbase.index_stats.can_connect'

NODE_MEMBERSHIP_TRANSLATION = {
'active': AgentCheck.OK,
Expand Down Expand Up @@ -323,3 +325,18 @@
"import_partitions",
"import_processing_time",
]

INDEX_STATS_COUNT_METRICS = [
"cache_hits",
"cache_misses",
"items_count",
"num_docs_indexed",
"num_items_flushed",
"num_requests",
"num_rows_returned",
"num_scan_errors",
"num_scan_timeouts",
"scan_bytes_read",
]

INDEXER_STATE_MAP = {'Active': 0, 'Pause': 1, 'Warmup': 2}
6 changes: 6 additions & 0 deletions couchbase/datadog_checks/couchbase/data/conf.yaml.example
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,12 @@ instances:
#
# sync_gateway_url: http://localhost:4986

## @param index_stats_url - string - optional - default: http://localhost:9102
## The URL to get Index Statistics, available since Couchbase 7.0.
## See https://docs.couchbase.com/server/current/rest-api/rest-index-stats.html
#
# index_stats_url: http://localhost:9102

## @param proxy - mapping - optional
## This overrides the `proxy` setting in `init_config`.
##
Expand Down
39 changes: 35 additions & 4 deletions couchbase/metadata.csv
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ couchbase.ram.quota_total,gauge,,byte,,RAM quota,0,couchbase,ram quota
couchbase.ram.used_by_data,gauge,,byte,,RAM used for data,0,couchbase,ram data used
couchbase.by_bucket.avg_bg_wait_time,gauge,,microsecond,,Average background wait time,-1,couchbase,avg bg wait
couchbase.by_bucket.avg_disk_commit_time,gauge,,second,,Average disk commit time,-1,couchbase,avg commit time
couchbase.by_bucket.avg_disk_update_time,gauge,,microsecond,,Average disk update time ,-1,couchbase,avg update time
couchbase.by_bucket.avg_disk_update_time,gauge,,microsecond,,"Average disk update time ",-1,couchbase,avg update time
couchbase.by_bucket.bg_wait_total,gauge,,byte,,Bytes read,0,couchbase,read
couchbase.by_bucket.bytes_read,gauge,,byte,,Bytes read,0,couchbase,read
couchbase.by_bucket.bytes_written,gauge,,byte,,Bytes written,0,couchbase,written
Expand All @@ -33,7 +33,7 @@ couchbase.by_bucket.couch_views_fragmentation,gauge,,percent,,View fragmentation
couchbase.by_bucket.couch_views_ops,gauge,,operation,,View operations,0,couchbase,view ops
couchbase.by_bucket.cpu_idle_ms,gauge,,millisecond,,CPU idle milliseconds,0,couchbase,cpu idle ms
couchbase.by_bucket.cpu_utilization_rate,gauge,,percent,,CPU utilization percentage,0,couchbase,cpu util
couchbase.by_bucket.curr_connections,gauge,,connection,,Current bucket connections,0,couchbase,conns
couchbase.by_bucket.curr_connections,gauge,,connection,,The current number of bucket connections,0,couchbase,conns
couchbase.by_bucket.curr_items_tot,gauge,,item,,Total number of items,0,couchbase,total items
couchbase.by_bucket.curr_items,gauge,,item,,Number of active items in memory,0,couchbase,mem items
couchbase.by_bucket.decr_hits,gauge,,hit,,Decrement hits,1,couchbase,decr hits
Expand Down Expand Up @@ -132,7 +132,7 @@ couchbase.by_bucket.replication_docs_rep_queue,gauge,,item,,,0,couchbase,repl do
couchbase.by_bucket.replication_meta_latency_aggr,gauge,,second,,,0,couchbase,repl meta latency aggr
couchbase.by_bucket.rest_requests,gauge,,request,second,Number of HTTP requests,0,couchbase,rest requests
couchbase.by_bucket.swap_total,gauge,,byte,,Total amount of swap available,0,couchbase,swap total
couchbase.by_bucket.swap_used,gauge,,byte,,Amount of swap used ,0,couchbase,swap used
couchbase.by_bucket.swap_used,gauge,,byte,,"Amount of swap used ",0,couchbase,swap used
couchbase.by_bucket.vb_active_eject,gauge,,item,second,Number of items per second being ejected to disk from active vBuckets,0,couchbase,vb active eject
couchbase.by_bucket.vb_active_itm_memory,gauge,,item,,Amount of active user data cached in RAM in this bucket,0,couchbase,vb active item mem
couchbase.by_bucket.vb_active_meta_data_memory,gauge,,item,,Amount of active item metadata consuming RAM in this bucket,0,couchbase,vb active meta mem
Expand Down Expand Up @@ -313,4 +313,35 @@ couchbase.sync_gateway.shared_bucket_import.import_high_seq,gauge,,,,The highest
couchbase.sync_gateway.shared_bucket_import.import_partitions,count,,,,The total number of import partitions.,0,couchbase,
couchbase.sync_gateway.shared_bucket_import.import_processing_time,count,,,,The total time taken to process a document import.,0,couchbase,
couchbase.sync_gateway.system_memory_total,count,,byte,,The total memory available on the system in bytes.,0,couchbase,
couchbase.sync_gateway.warn_count,count,,byte,,The total number of warnings logged.,0,couchbase,
couchbase.sync_gateway.warn_count,count,,byte,,The total number of warnings logged.,0,couchbase,
couchbase.indexer.indexer_state,gauge,,,,"The current state of the Index service on this node (0 = Active, 1 = Pause, 2 = Warmup)",0,couchbase,
couchbase.indexer.memory_quota,gauge,,byte,,The memory quota assigned to the Index service on this node by user configuration,0,couchbase,
couchbase.indexer.memory_total_storage,gauge,,byte,,The total size allocated in the indexer across all indexes. This also accounts for memory fragmentation,0,couchbase,
couchbase.indexer.memory_used,gauge,,byte,,The amount of memory used by the Index service on this node,0,couchbase,
couchbase.indexer.total_indexer_gc_pause_ns,gauge,,nanosecond,,The total time the indexer has spent in GC pause since the last startup,0,couchbase,
couchbase.index.avg_drain_rate,gauge,,item,second,The average number of items flushed from memory to disk storage per second,0,couchbase,
couchbase.index.avg_item_size,gauge,,byte,,The average size of the keys,0,couchbase,
couchbase.index.avg_scan_latency,gauge,,nanosecond,,The average time to serve a scan request,0,couchbase,
couchbase.index.cache_hit_percent,gauge,,percent,,The percentage of memory accesses that were served from the managed cache,0,couchbase,
couchbase.index.cache_hits,count,,,,The number of accesses to this index data from RAM,0,couchbase,
couchbase.index.cache_misses,count,,,,The number of accesses to this index data from disk,0,couchbase,
couchbase.index.data_size,gauge,,byte,,The size of indexable data that is maintained for the index or partition,0,couchbase,
couchbase.index.disk_size,gauge,,byte,,The total disk file size consumed by the index or partition,0,couchbase,
couchbase.index.frag_percent,gauge,,percent,,The percentage fragmentation of the index,0,couchbase,
couchbase.index.initial_build_progress,gauge,,percent,,"The percentage of the initial build progress for the index. When the initial build is completed, the value is 100. For an index partition, the value is listed as 0",0,couchbase,
couchbase.index.items_count,count,,item,,The number of items currently indexed,0,couchbase,
couchbase.index.last_known_scan_time,gauge,,nanosecond,,"Timestamp of the last scan request received for this index (Unix timestamp in nanoseconds). This may be useful for determining whether this index is currently unused. Note: This statistic is persisted to disk every 15 minutes, so it is preserved when the indexer restarts",0,couchbase,
couchbase.index.num_docs_indexed,count,,document,,The number of documents indexed by the indexer since last startup,0,couchbase,
couchbase.index.num_docs_pending,gauge,,document,,The number of documents pending to be indexed,0,couchbase,
couchbase.index.num_docs_queued,gauge,,document,,The number of documents queued to be indexed,0,couchbase,
couchbase.index.num_items_flushed,count,,item,,The number of items flushed from memory to disk storage,0,couchbase,
couchbase.index.num_pending_requests,gauge,,request,,The number of requests received but not yet served by the indexer,0,couchbase,
couchbase.index.num_requests,count,,request,,The number of requests served by the indexer since last startup,0,couchbase,
couchbase.index.num_rows_returned,count,,row,,The total number of rows returned so far by the indexer,0,couchbase,
couchbase.index.num_scan_errors,count,,request,,The number of requests that failed due to errors other than timeout,0,couchbase,
couchbase.index.num_scan_timeouts,count,,request,,"The number of requests that timed out, either waiting for snapshots or during scan in progress",0,couchbase,
couchbase.index.recs_in_mem,gauge,,record,,"For standard index storage, this is the number of records in this index that are stored in memory. For memory-optimized index storage, this is the same as items_count",0,couchbase,
couchbase.index.recs_on_disk,gauge,,record,,"For standard index storage, this is the number of records in this index that are stored on disk. For memory-optimized index storage, this is 0",0,couchbase,
couchbase.index.resident_percent,gauge,,percent,,The percentage of data held in memory,0,couchbase,
couchbase.index.scan_bytes_read,count,,byte,,The number of bytes read by a scan since last startup,0,couchbase,
couchbase.index.total_scan_duration,gauge,,nanosecond,,The total time spent by the indexer in scanning rows since last startup,0,couchbase,
52 changes: 51 additions & 1 deletion couchbase/tests/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,20 +12,30 @@
PORT = '8091'
QUERY_PORT = '8093'
SG_PORT = '4985'
INDEX_STATS_PORT = '9102'

# Tags and common bucket name
CUSTOM_TAGS = ['optional:tag1']
CHECK_TAGS = CUSTOM_TAGS + ['instance:http://{}:{}'.format(HOST, PORT)]
BUCKET_NAME = 'cb_bucket'
INDEX_STATS_TAGS = CHECK_TAGS + [
'bucket:cb_bucket',
'collection:default',
'index_name:gamesim_primary',
'scope:default',
]

URL = 'http://{}:{}'.format(HOST, PORT)
QUERY_URL = 'http://{}:{}'.format(HOST, QUERY_PORT)
SG_URL = 'http://{}:{}'.format(HOST, SG_PORT)
INDEX_STATS_URL = 'http://{}:{}'.format(HOST, INDEX_STATS_PORT)
CB_CONTAINER_NAME = 'couchbase-standalone'
USER = 'Administrator'
PASSWORD = 'password'

DEFAULT_INSTANCE = {'server': URL, 'user': USER, 'password': PASSWORD, 'timeout': 0.5, 'tags': CUSTOM_TAGS}
COUCHBASE_MAJOR_VERSION = int(os.getenv('COUCHBASE_VERSION').split(".")[0])

DEFAULT_INSTANCE = {'server': URL, 'user': USER, 'password': PASSWORD, 'timeout': 1, 'tags': CUSTOM_TAGS}

SYNC_GATEWAY_METRICS = [
"couchbase.sync_gateway.admin_net_bytes_recv",
Expand Down Expand Up @@ -132,3 +142,43 @@
"couchbase.sync_gateway.system_memory_total",
"couchbase.sync_gateway.warn_count",
]

INDEX_STATS_INDEXER_METRICS = [
'couchbase.indexer.indexer_state',
'couchbase.indexer.memory_quota',
'couchbase.indexer.memory_total_storage',
'couchbase.indexer.memory_used',
'couchbase.indexer.total_indexer_gc_pause_ns',
]

INDEX_STATS_GAUGE_METRICS = [
'couchbase.index.avg_drain_rate',
'couchbase.index.avg_item_size',
'couchbase.index.avg_scan_latency',
'couchbase.index.cache_hit_percent',
'couchbase.index.data_size',
'couchbase.index.disk_size',
'couchbase.index.frag_percent',
'couchbase.index.initial_build_progress',
'couchbase.index.last_known_scan_time',
'couchbase.index.num_docs_pending',
'couchbase.index.num_docs_queued',
'couchbase.index.num_pending_requests',
'couchbase.index.recs_in_mem',
'couchbase.index.recs_on_disk',
'couchbase.index.resident_percent',
'couchbase.index.total_scan_duration',
]

INDEX_STATS_COUNT_METRICS = [
'couchbase.index.cache_hits',
'couchbase.index.cache_misses',
'couchbase.index.items_count',
'couchbase.index.num_docs_indexed',
'couchbase.index.num_items_flushed',
'couchbase.index.num_requests',
'couchbase.index.num_rows_returned',
'couchbase.index.num_scan_errors',
'couchbase.index.num_scan_timeouts',
'couchbase.index.scan_bytes_read',
]
1 change: 1 addition & 0 deletions couchbase/tests/compose/standalone.compose
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ services:
image: "couchbase/server:${COUCHBASE_VERSION}"
ports:
- 8091-8094:8091-8094
- 9102:9102
container_name: ${CB_CONTAINER_NAME}
couchbase-sync-gateway:
container_name: couchbase-sync-gateway
Expand Down
Loading