-
Notifications
You must be signed in to change notification settings - Fork 6
Metric Glossary
Label(s):
-
git_branch
: -
git_revision
: -
jvm_version
: -
scala_version
: -
pstl_version
:
Indicates whether the job identified by job_id
is up or down. Reports 1
if the job is online, or 0
if the job is offline. Users should be aware mission-critical dashboards and alerts should also check for existence of a particular job_id
within the pstl_job_up
time series as needed. A job_id
which has been offline for an extended period of time may not be reported in the pstl_job_up
time series. If you are a prometheus user, the absent function may be of interest.
Labels:
-
job_id
: the unique identifier of the job
Example(s):
# jobA is online
pstl_job_up{job_id="jobA",} 1.0
# jobB is offline
pstl_job_up{job_id="jobB",} 0.0
Indicates the total number of micro-batches processed by this query.
Label(s):
-
job_id
: the name of this job -
query_id
: the name of this query
Example(s):
# query bar in job foo has processed 2 micro-batches
pstl_query_batch_total{job_id="foo",query_id="bar",} 2.0
# query baz in job foo has processed 0 micro-batches
pstl_query_batch_total{job_id="foo",query_id="baz",} 0.0
Indicates the most recent micro-batch processed by this query.
Label(s):
-
job_id
: this name of this job -
query_id
: this name of this query
Example(s):
# query bar in job foo most recently processed micro-batch 1
pstl_query_batch_current{job_id="foo",query_id="bar",} 1.0
# query baz in job foo most recently processed micro-batch 0
pstl_query_batch_current{job_id="foo",query_id="baz",} 0.0
Indicates the amount of time each unique stage_id
took for each micro-batch processed by this query.
- triggerExecution is the total amount of time spent processing this micro-batch
- queryPlanning is the total amount of time spent generating a new query plan for this micro-batch
- getOffset is the total amount of time source(s) spent determining if they had new data available for processing
- getBatch is the total amount of time source(s) spent producing a data frame representing newly available data for processing
- walCommit is the total amount of time spent writing to the write-ahead-log for this micro-batch's durable state
- addBatch is the total amount of time spent by the sink processing this micro-batch
TODO: explain triggerExecution vs addBatch
Label(s):
-
job_id
: the name of this job -
query_id
: the name of this query -
stage_id
: the name of this stage
Example(s):
pstl_query_duration_seconds_total{job_id="foo",query_id="bar",stage_id="triggerExecution",} 5.212
pstl_query_duration_seconds_total{job_id="foo",query_id="bar",stage_id="queryPlanning",} 0.112
pstl_query_duration_seconds_total{job_id="foo",query_id="bar",stage_id="getOffset",} 0.10900000000000001
pstl_query_duration_seconds_total{job_id="foo",query_id="bar",stage_id="getBatch",} 0.274
pstl_query_duration_seconds_total{job_id="foo",query_id="bar",stage_id="walCommit",} 0.078
pstl_query_duration_seconds_total{job_id="foo",query_id="bar",stage_id="addBatch",} 4.616
Indicates the total number of rows processed by this query over time. This metric is not durable, that is, if the JVM hosting this streaming query dies, the total number of rows processed will reset to 0
in a new JVM.
Label(s):
-
job_id
: the job this query belongs to -
query_id
: the name of this query
Example(s):
# query bar in job foo has processed a total of 500 rows over time
pstl_query_restarts_total{job_id="foo",query_id="bar",} 0.0
# query baz in job foo has processed a total of 0 rows over time
pstl_query_restarts_total{job_id="foo",query_id="baz",} 0.0
Indicates the number of rows provided to this query to process in the most recent micro-batch. If a micro-batch is currently executing, this metric will represent the number of rows provided to this query in the previous recent micro-batch.
TODO: explain pstl_query_input_rows_current vs pstl_query_processed_rows_current better
# query bar in job foo was most recently given 10 rows for processing
pstl_query_input_rows_current{job_id="foo",query_id="bar",} 10.0
Indicates the number of rows processed by this query in the most recent micro-batch. If a micro-batch is currently executing, this metric will report the previous micro-batch's number of processed rows. This metric is an aggregated rate, meaning any further derivatives will likely yield unexpected results.
Labels:
-
job_id
: the job this query belongs to -
query_id
: the name of this query
Example(s):
# query bar in job foo last processed 512 rows
pstl_query_processed_rows_current{job_id="foo",query_id="bar",} 512.0
# query baz in job foo last processed 0 rows
pstl_query_processed_rows_current{job_id="foo",query_id="bar",} 0.0
Indicates the number of times a streaming query has been restarted. The value of pstl_query_restarts_total
is non-durable, that is, if the JVM hosting this streaming query dies, the total number of query restarts will reset to 0
if the streaming query restarts in a new JVM.
Labels:
-
job_id
: the job this query belongs to -
query_id
: the name of this query
Example(s):
# query bar in job foo has never been restarted within this jvm
pstl_query_restarts_total{job_id="foo",query_id="bar",} 0.0
# query baz in job foo has been restarted 8 times within this jvm
pstl_query_restarts_total{job_id="foo",query_id="baz",} 8.0
Indicates the number of seconds remaining in exponential backoff delay before attempting to restart this streaming query. Internally, this metric counts down roughly every 100 milliseconds to provide near real-time feedback to monitoring infrastructure.
Labels:
-
job_id
: the unique identifier of the job this query belongs to -
query_id
: the unique identifier of this streaming query
Example(s):
# query bar in job foo will restart in 100 seconds
pstl_query_restart_seconds{job_id="foo",query_id="bar",} 100.0
...
# query bar in job foo will restart in 5 seconds
pstl_query_restart_seconds{job_id="foo",query_id="bar",} 5.0
...
# query bar in job foo is no longer queued for restart
pstl_query_restart_seconds{job_id="foo",query_id="bar",} 0.0
Indicates whether the query identified by query_id
within the job identified by job_id
is up or down. Reports 1
if the query is online, or 0
if the query is offline. Users should be aware mission-critical dashboards and alerts should also check for existence of a particular query_id
within the time series as needed. If you are a prometheus user, the absent function may be of interest.
Labels:
-
job_id
: the unique identifier of the job this query belongs to -
query_id
: the unique identifier of this streaming query
Example(s):
# query bar in job foo is online
pstl_query_up{job_id="foo",query_id="bar",} 1.0
# query baz in job foo is offline
pstl_query_up{job_id="foo",query_id="baz",} 0.0
Label(s):
-
job_id
: -
sink_id
: -
vertica_url
: -
vertica_table
:
Label(s):
-
job_id
: -
sink_id
: -
vertica_url
: -
vertica_table
: -
stage_id
:
Label(s):
-
job_id
: -
sink_id
: -
vertica_url
: -
vertica_table
:
Label(s):
-
job_id
: -
sink_id
: -
vertica_url
: -
vertica_table
: