Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blackbox: fix the issue of TLS handshake error in tls cluster #1443

Merged
merged 5 commits into from
Jun 24, 2021

Conversation

9547
Copy link
Contributor

@9547 9547 commented Jun 22, 2021

What problem does this PR solve?

I've deployed a TLS supported cluster, found the PD and TiKV has may TLS related error/warn logs:

PD

  [2021/06/21 09:48:42.009 +00:00] [WARN] [config_logging.go:279] ["rejected connection"] [remote-addr=172.19.1.101:33196] [server-nam
  e=] [error=EOF]
  [2021/06/21 09:49:12.009 +00:00] [WARN] [config_logging.go:279] ["rejected connection"] [remote-addr=172.19.1.101:34068] [server-nam
  e=] [error=EOF]
  [2021/06/21 09:49:42.009 +00:00] [WARN] [config_logging.go:279] ["rejected connection"] [remote-addr=172.19.1.101:34968] [server-nam
  e=] [error=EOF]
  [2021/06/21 09:50:12.009 +00:00] [WARN] [config_logging.go:279] ["rejected connection"] [remote-addr=172.19.1.101:35840] [server-nam
  e=] [error=EOF]
  [2021/06/21 09:50:42.009 +00:00] [WARN] [config_logging.go:279] ["rejected connection"] [remote-addr=172.19.1.101:36766] [server-nam
  e=] [error=EOF]
  [2021/06/21 09:51:12.009 +00:00] [WARN] [config_logging.go:279] ["rejected connection"] [remote-addr=172.19.1.101:37704] [server-nam
  e=] [error=EOF]

TiKV

  [2021/06/21 09:49:31.165 +00:00] [ERROR] [mod.rs:856] ["Status server error: TLS handshake error"]
  [2021/06/21 09:50:01.167 +00:00] [ERROR] [mod.rs:856] ["Status server error: TLS handshake error"]
  [2021/06/21 09:50:31.165 +00:00] [ERROR] [mod.rs:856] ["Status server error: TLS handshake error"]
  [2021/06/21 09:51:01.165 +00:00] [ERROR] [mod.rs:856] ["Status server error: TLS handshake error"]
  [2021/06/21 09:51:31.165 +00:00] [ERROR] [mod.rs:856] ["Status server error: TLS handshake error"]
  [2021/06/21 09:52:01.165 +00:00] [ERROR] [mod.rs:856] ["Status server error: TLS handshake error"]

Those error message disturbed me for a while, after digging for a while, found it's caused by blackbox_exporter and Prometheus.

- job_name: "tidb_port_probe"
scrape_interval: 30s
metrics_path: /probe
params:
module: [tcp_connect]

tcp_connect:
prober: tcp

In Prometheus, we infer port viability by port_prober, but by default, this is connected directly through TCP by tcp_connect module, and the above logs are due to the lack of certificates when TLS is enabled. Therefore, here I introduce the tls_connect module, which is used to detect TLS

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Code changes

  • Has exported function/method change
  • Has exported variable/fields change
  • Has interface methods change
  • Has persistent data change

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation

Release notes:

NONE

@ti-chi-bot ti-chi-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 22, 2021
@codecov-commenter
Copy link

codecov-commenter commented Jun 22, 2021

Codecov Report

Merging #1443 (4e34065) into master (5d1942a) will increase coverage by 3.45%.
The diff coverage is 34.02%.

❗ Current head 4e34065 differs from pull request most recent head 58cb47d. Consider uploading reports for the commit 58cb47d to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1443      +/-   ##
==========================================
+ Coverage   25.13%   28.58%   +3.45%     
==========================================
  Files         264      245      -19     
  Lines       20826    20533     -293     
==========================================
+ Hits         5235     5870     +635     
+ Misses      14788    13555    -1233     
- Partials      803     1108     +305     
Flag Coverage Δ
dm 24.87% <34.02%> (?)
integrate 28.58% <34.02%> (+13.16%) ⬆️
playground 13.66% <0.00%> (?)
tiup ?
unittest ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/cluster/task/tls.go 0.00% <0.00%> (ø)
pkg/cluster/manager/deploy.go 59.45% <14.28%> (+59.45%) ⬆️
pkg/cluster/task/builder.go 57.48% <20.00%> (+57.48%) ⬆️
pkg/cluster/manager/builder.go 60.58% <33.33%> (+60.58%) ⬆️
pkg/cluster/template/config/blackbox.go 57.89% <63.63%> (+57.89%) ⬆️
pkg/cluster/task/monitored_config.go 55.71% <85.71%> (+55.71%) ⬆️
pkg/repository/store/store.go 0.00% <0.00%> (-100.00%) ⬇️
components/dm/ansible/worker.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/repository/utils/hash.go 0.00% <0.00%> (-81.82%) ⬇️
pkg/meta/err.go 0.00% <0.00%> (-81.25%) ⬇️
... and 237 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5d1942a...58cb47d. Read the comment docs.

@ti-chi-bot
Copy link
Member

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • AstroProfundis

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jun 23, 2021
@9547 9547 force-pushed the fix/blackbox-tls-connect branch from 47b8c7c to 58cb47d Compare June 23, 2021 15:28
@AstroProfundis
Copy link
Contributor

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 58cb47d

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Jun 24, 2021
@ti-chi-bot ti-chi-bot merged commit 48cd6ae into pingcap:master Jun 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT1 Indicates that a PR has LGTM 1.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants