Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ldap: add timeout and retry-backoff for ldap #51927

Merged
merged 1 commit into from
Mar 20, 2024

Conversation

YangKeao
Copy link
Member

@YangKeao YangKeao commented Mar 20, 2024

What problem does this PR solve?

Issue Number: close #51883

This PR is a smaller version of #51912. We'll finally get #51912 merged, but we need a smaller one to focus on the timeout mechanism (which is much simpler than refactor the locks).

If the LDAP connection lost after the first handshake, the LDAP goroutine and function call will hang forever.

What changed and how does it work?

I have done two modifications in this PR:

  1. Add a timeout for LDAP dialing and requests. Therefore, the lock will be held for at most several seconds.
  2. Add an interval to the retry mechanism, to avoid using all quotas provided by the LDAP service.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

docker run --network host -it yangkeao/ldap-sasl-example:d2b324 /bin/bash to get an environment with LDAP server. Then running a TiDB server at port 4000.

Create user, setup variables, and prepare CA:

set global authentication_ldap_sasl_server_host='127.0.0.1';
set global authentication_ldap_sasl_bind_root_dn='cn=admin,dc=example,dc=org';
set global authentication_ldap_sasl_bind_root_pwd='123456';
set global authentication_ldap_sasl_ca_path='/tmp/ca.crt';
set global authentication_ldap_simple_server_host='127.0.0.1';
set global authentication_ldap_simple_bind_root_dn='cn=admin,dc=example,dc=org';
set global authentication_ldap_simple_bind_root_pwd='123456';
set global authentication_ldap_simple_ca_path='/tmp/ca.crt';
create user yangkeao IDENTIFIED WITH authentication_ldap_simple as 'cn=yangkeao+uid=yangkeao,dc=example,dc=org';
sudo cp /proc/$(pidof mysqld)/root/etc/ssl/certs/example.crt /tmp/ca.crt

Then you can login to TiDB with yangkeao user:

LIBMYSQL_ENABLE_CLEARTEXT_PLUGIN=1 mysql -h 127.0.0.1 -u yangkeao -P 4000 -p123456

Enable or disable authentication_ldap_simple_tls are both fine.

Use the following iptables command to drop all packets to LDAP server:

sudo iptables -A INPUT -p tcp --dport 389 -j DROP

Then login without TLS will timeout after 10 seconds:

root@home:/# time LIBMYSQL_ENABLE_CLEARTEXT_PLUGIN=1 mysql -h 127.0.0.1 -u yangkeao -P 4000 -p123456
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Access denied for user 'yangkeao'@'127.0.0.1' (using password: YES)

real    0m10.023s
user    0m0.000s
sys     0m0.009s

Then login with TLS will timeout after 20 seconds:

root@home:/# time LIBMYSQL_ENABLE_CLEARTEXT_PLUGIN=1 mysql -h 127.0.0.1 -u yangkeao -P 4000 -p123456
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Access denied for user 'yangkeao'@'127.0.0.1' (using password: YES)

real    0m20.023s
user    0m0.005s
sys     0m0.004s

This PR also fixed some tiny issues: like rebuilding the connection pool after resetting the connection related variables, to avoid having wrong connection in the pool.

Release note

None

@YangKeao YangKeao requested review from bb7133 and CbcWestwolf March 20, 2024 05:00
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 20, 2024
Copy link

codecov bot commented Mar 20, 2024

Codecov Report

Merging #51927 (a8007e4) into master (f4e366e) will decrease coverage by 16.1248%.
Report is 3 commits behind head on master.
The diff coverage is 17.8571%.

Additional details and impacted files
@@                Coverage Diff                @@
##             master     #51927         +/-   ##
=================================================
- Coverage   70.7589%   54.6341%   -16.1248%     
=================================================
  Files          1477       1583        +106     
  Lines        438569     605035     +166466     
=================================================
+ Hits         310327     330556      +20229     
- Misses       108827     251508     +142681     
- Partials      19415      22971       +3556     
Flag Coverage Δ
integration 35.0538% <17.8571%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 53.9957% <ø> (ø)
parser ∅ <ø> (∅)
br 50.8429% <ø> (+4.9782%) ⬆️

@YangKeao YangKeao force-pushed the add-timeout-for-ldap branch from 52d0f54 to a8007e4 Compare March 20, 2024 05:24
Copy link
Member

@bb7133 bb7133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 20, 2024
// fail to bind to anonymous user, just release this connection and try to get a new one
impl.ldapConnectionPool.Put(nil)

retryCount++
if retryCount >= getConnectionMaxRetry {
return nil, errors.Wrap(err, "fail to bind to anonymous user")
}
// Be careful that it's still holding the lock of the system variables, so it's not good to sleep here.
// TODO: refactor the `RWLock` to avoid the problem of holding the lock.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use syncutil.RWMutex to replace sync.RWMutex. which can find deadlock in the test.

Copy link
Member Author

@YangKeao YangKeao Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current issue is not about deadlock (there's only one lock, and is not recursive, so it's impossible to deadlock here).

The problem is that a pending write lock (by rebuildSysVarCache) will block all other read lock, which I didn't realize before 🤦 😢 , which makes this issue much more serious.

Copy link

ti-chi-bot bot commented Mar 20, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bb7133, CbcWestwolf

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 20, 2024
Copy link

ti-chi-bot bot commented Mar 20, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-03-20 05:25:09.23909834 +0000 UTC m=+1440736.261344729: ☑️ agreed by bb7133.
  • 2024-03-20 06:18:00.393262012 +0000 UTC m=+1443907.415508400: ☑️ agreed by CbcWestwolf.

@ti-chi-bot ti-chi-bot bot merged commit d940619 into pingcap:master Mar 20, 2024
23 checks passed
@YangKeao
Copy link
Member Author

/cherry-pick release-7.1-20240320-v7.1.3

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Mar 20, 2024
@ti-chi-bot
Copy link
Member

@YangKeao: new pull request created to branch release-7.1-20240320-v7.1.3: #51941.

In response to this:

/cherry-pick release-7.1-20240320-v7.1.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@YangKeao YangKeao added the needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. label Mar 20, 2024
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.1: #51943.

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Mar 20, 2024
ti-chi-bot bot pushed a commit that referenced this pull request Mar 20, 2024
@YangKeao YangKeao added the needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. label Apr 12, 2024
ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Apr 12, 2024
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #52551.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A dead LDAP upstream can block the authentication and show global variables.
5 participants