Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

ansible: add & update metric panels for syncer and misc #803

Merged
merged 12 commits into from
Jul 21, 2020

Conversation

lance6716
Copy link
Collaborator

@lance6716 lance6716 commented Jul 15, 2020

What problem does this PR solve?

add grafana panels for #772

What is changed and how it works?

modify grafana json.

  1. multi-selection on source and instance
    image

but seems histogram_quantile, rate can't work on multi-source

  1. add panels for worker opErrCounter
    image

  2. change some metrics for worker

  • replace filter $instance to $source_id:
    dm_worker_task_state
  1. metrics for syncer
  • replace filter $instance to $source_id:
    dm_syncer_remaining_time, dm_syncer_binlog_file, dm_syncer_binlog_transform_cost_count, dm_syncer_skip_binlog_duration_count, dm_syncer_read_binlog_duration_bucket, dm_syncer_binlog_transform_cost_bucket, dm_syncer_dispatch_binlog_duration_bucket, dm_syncer_binlog_event_size_bucket, dm_syncer_queue_size, dm_syncer_added_jobs_total, dm_syncer_finished_jobs_total, dm_syncer_add_job_duration_bucket, dm_syncer_conflict_detect_duration_bucket, dm_syncer_skip_binlog_duration_bucket, dm_syncer_unsynced_table_number, dm_syncer_shard_lock_resolving
  • keep $instance:
    • dm_syncer_replication_lag: report by heartbeat, no source_id currently
    • dm_relay_binlog_file{instance="$instance", node="relay"} - ON(instance, job) dm_syncer_binlog_file{instance="$instance", task="$task", node="syncer"}: relay doesn't have source_id currently
    • dm_syncer_txn_duration_time_bucket, dm_syncer_stmt_duration_time_bucket: used by DBConn in many place (checkpoint, onlineddl, ...), seems it's more appropriate to represent a instance's state

Check List

Tests

  • Manual test (see panels in broswer)

Code changes

Side effects

Related changes

@lance6716 lance6716 added priority/normal Minor change, requires approval from ≥1 primary reviewer status/PTAL This PR is ready for review. Add this label back after committing new changes type/feature New feature labels Jul 15, 2020
@lance6716 lance6716 added type/dm-ansible and removed type/feature New feature labels Jul 15, 2020
@codecov
Copy link

codecov bot commented Jul 15, 2020

Codecov Report

Merging #803 into master will increase coverage by 0.3925%.
The diff coverage is 39.3162%.

@@               Coverage Diff                @@
##             master       #803        +/-   ##
================================================
+ Coverage   57.0981%   57.4907%   +0.3925%     
================================================
  Files           205        211         +6     
  Lines         21104      23489      +2385     
================================================
+ Hits          12050      13504      +1454     
- Misses         7890       8731       +841     
- Partials       1164       1254        +90     

@lance6716 lance6716 added status/WIP This PR is still work in progress and removed status/PTAL This PR is ready for review. Add this label back after committing new changes labels Jul 16, 2020
@lance6716 lance6716 changed the title ansible: add metric panels ansible: add & update metric panels for syncer Jul 17, 2020
@lance6716 lance6716 changed the title ansible: add & update metric panels for syncer ansible: add & update metric panels for syncer and misc Jul 17, 2020
@lance6716 lance6716 added status/PTAL This PR is ready for review. Add this label back after committing new changes and removed status/WIP This PR is still work in progress labels Jul 17, 2020
Copy link
Collaborator

@GMHDBJD GMHDBJD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@GMHDBJD GMHDBJD added status/LGT1 One reviewer already commented LGTM and removed status/PTAL This PR is ready for review. Add this label back after committing new changes labels Jul 20, 2020
@csuzhangxc
Copy link
Member

add panels for worker opErrCounter

How about re-order panels for opErrCounter? like:

  1. before any operate error
  2. source bound error
  3. start error
  4. pause error
  5. resume error
  6. auto resume error
  7. update error
  8. stop error

@lance6716
Copy link
Collaborator Author

lance6716 commented Jul 21, 2020

add panels for worker opErrCounter

How about re-order panels for opErrCounter? like:

  1. before any operate error
  2. source bound error
  3. start error
  4. pause error
  5. resume error
  6. auto resume error
  7. update error
  8. stop error

visual result updated in description of this PR #803 (comment) , some error showed in panel are generated by repeatly call resume-task and pause-task

@lance6716 lance6716 added the status/WIP This PR is still work in progress label Jul 21, 2020
@lance6716 lance6716 removed the status/WIP This PR is still work in progress label Jul 21, 2020
Copy link
Member

@csuzhangxc csuzhangxc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@csuzhangxc csuzhangxc added status/LGT2 Two reviewers already commented LGTM, ready for merge and removed status/LGT1 One reviewer already commented LGTM labels Jul 21, 2020
@csuzhangxc csuzhangxc added this to the v2.0.0 RC milestone Jul 21, 2020
@lance6716
Copy link
Collaborator Author

/run-all-tests

1 similar comment
@lance6716
Copy link
Collaborator Author

/run-all-tests

@lance6716 lance6716 merged commit b145bf5 into pingcap:master Jul 21, 2020
@lance6716 lance6716 deleted the add-panels-1 branch July 21, 2020 05:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
priority/normal Minor change, requires approval from ≥1 primary reviewer status/LGT2 Two reviewers already commented LGTM, ready for merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants