Skip to content

Metric Glossary

vnnv01 edited this page Mar 6, 2018 · 1 revision

pstl_build_info

Label(s):

  • git_branch:
  • git_revision:
  • jvm_version:
  • scala_version:
  • pstl_version:

pstl_job_up

Indicates whether the job identified by job_id is up or down. Reports 1 if the job is online, or 0 if the job is offline. Users should be aware mission-critical dashboards and alerts should also check for existence of a particular job_id within the pstl_job_up time series as needed. A job_id which has been offline for an extended period of time may not be reported in the pstl_job_up time series. If you are a prometheus user, the absent function may be of interest.

Labels:

  • job_id: the unique identifier of the job

Example(s):

# jobA is online
pstl_job_up{job_id="jobA",} 1.0
# jobB is offline
pstl_job_up{job_id="jobB",} 0.0

pstl_query_batch_total

Indicates the total number of micro-batches processed by this query.

Label(s):

  • job_id: the name of this job
  • query_id: the name of this query

Example(s):

# query bar in job foo has processed 2 micro-batches
pstl_query_batch_total{job_id="foo",query_id="bar",} 2.0
# query baz in job foo has processed 0 micro-batches
pstl_query_batch_total{job_id="foo",query_id="baz",} 0.0

pstl_query_batch_current

Indicates the most recent micro-batch processed by this query.

Label(s):

  • job_id: this name of this job
  • query_id: this name of this query

Example(s):

# query bar in job foo most recently processed micro-batch 1
pstl_query_batch_current{job_id="foo",query_id="bar",} 1.0
# query baz in job foo most recently processed micro-batch 0
pstl_query_batch_current{job_id="foo",query_id="baz",} 0.0

pstl_query_duration_seconds_total

Indicates the amount of time each unique stage_id took for each micro-batch processed by this query.

  • triggerExecution is the total amount of time spent processing this micro-batch
  • queryPlanning is the total amount of time spent generating a new query plan for this micro-batch
  • getOffset is the total amount of time source(s) spent determining if they had new data available for processing
  • getBatch is the total amount of time source(s) spent producing a data frame representing newly available data for processing
  • walCommit is the total amount of time spent writing to the write-ahead-log for this micro-batch's durable state
  • addBatch is the total amount of time spent by the sink processing this micro-batch

TODO: explain triggerExecution vs addBatch

Label(s):

  • job_id: the name of this job
  • query_id: the name of this query
  • stage_id: the name of this stage

Example(s):

pstl_query_duration_seconds_total{job_id="foo",query_id="bar",stage_id="triggerExecution",} 5.212
pstl_query_duration_seconds_total{job_id="foo",query_id="bar",stage_id="queryPlanning",} 0.112
pstl_query_duration_seconds_total{job_id="foo",query_id="bar",stage_id="getOffset",} 0.10900000000000001
pstl_query_duration_seconds_total{job_id="foo",query_id="bar",stage_id="getBatch",} 0.274
pstl_query_duration_seconds_total{job_id="foo",query_id="bar",stage_id="walCommit",} 0.078
pstl_query_duration_seconds_total{job_id="foo",query_id="bar",stage_id="addBatch",} 4.616

pstl_query_input_rows_total

Indicates the total number of rows processed by this query over time. This metric is not durable, that is, if the JVM hosting this streaming query dies, the total number of rows processed will reset to 0 in a new JVM.

Label(s):

  • job_id: the job this query belongs to
  • query_id: the name of this query

Example(s):

# query bar in job foo has processed a total of 500 rows over time
pstl_query_restarts_total{job_id="foo",query_id="bar",} 0.0
# query baz in job foo has processed a total of 0 rows over time
pstl_query_restarts_total{job_id="foo",query_id="baz",} 0.0

pstl_query_input_rows_current

Indicates the number of rows provided to this query to process in the most recent micro-batch. If a micro-batch is currently executing, this metric will represent the number of rows provided to this query in the previous recent micro-batch.

TODO: explain pstl_query_input_rows_current vs pstl_query_processed_rows_current better

# query bar in job foo was most recently given 10 rows for processing 
pstl_query_input_rows_current{job_id="foo",query_id="bar",} 10.0

pstl_query_processed_rows_current

Indicates the number of rows processed by this query in the most recent micro-batch. If a micro-batch is currently executing, this metric will report the previous micro-batch's number of processed rows. This metric is an aggregated rate, meaning any further derivatives will likely yield unexpected results.

Labels:

  • job_id: the job this query belongs to
  • query_id: the name of this query

Example(s):

# query bar in job foo last processed 512 rows
pstl_query_processed_rows_current{job_id="foo",query_id="bar",} 512.0
# query baz in job foo last processed 0 rows
pstl_query_processed_rows_current{job_id="foo",query_id="bar",} 0.0

pstl_query_restarts_total

Indicates the number of times a streaming query has been restarted. The value of pstl_query_restarts_total is non-durable, that is, if the JVM hosting this streaming query dies, the total number of query restarts will reset to 0 if the streaming query restarts in a new JVM.

Labels:

  • job_id: the job this query belongs to
  • query_id: the name of this query

Example(s):

# query bar in job foo has never been restarted within this jvm
pstl_query_restarts_total{job_id="foo",query_id="bar",} 0.0
# query baz in job foo has been restarted 8 times within this jvm
pstl_query_restarts_total{job_id="foo",query_id="baz",} 8.0

pstl_query_restart_seconds

Indicates the number of seconds remaining in exponential backoff delay before attempting to restart this streaming query. Internally, this metric counts down roughly every 100 milliseconds to provide near real-time feedback to monitoring infrastructure.

Labels:

  • job_id: the unique identifier of the job this query belongs to
  • query_id: the unique identifier of this streaming query

Example(s):

# query bar in job foo will restart in 100 seconds
pstl_query_restart_seconds{job_id="foo",query_id="bar",} 100.0
...
# query bar in job foo will restart in 5 seconds
pstl_query_restart_seconds{job_id="foo",query_id="bar",} 5.0
...
# query bar in job foo is no longer queued for restart
pstl_query_restart_seconds{job_id="foo",query_id="bar",} 0.0

pstl_query_up

Indicates whether the query identified by query_id within the job identified by job_id is up or down. Reports 1 if the query is online, or 0 if the query is offline. Users should be aware mission-critical dashboards and alerts should also check for existence of a particular query_id within the time series as needed. If you are a prometheus user, the absent function may be of interest.

Labels:

  • job_id: the unique identifier of the job this query belongs to
  • query_id: the unique identifier of this streaming query

Example(s):

# query bar in job foo is online
pstl_query_up{job_id="foo",query_id="bar",} 1.0
# query baz in job foo is offline
pstl_query_up{job_id="foo",query_id="baz",} 0.0

pstl_vertica_kafka_producer

Label(s):

  • job_id:
  • sink_id:
  • vertica_url:
  • vertica_table:

pstl_vertica_copy_duration

Label(s):

  • job_id:
  • sink_id:
  • vertica_url:
  • vertica_table:
  • stage_id:

pstl_vertica_copy_total

Label(s):

  • job_id:
  • sink_id:
  • vertica_url:
  • vertica_table:

pstl_vertica_copy_rows_total

Label(s):

  • job_id:
  • sink_id:
  • vertica_url:
  • vertica_table:
Clone this wiki locally