Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] Fix PartitionedProducerImpl::closeAsync to close sub-producers properly #125

Merged
merged 10 commits into from
Nov 28, 2022

Conversation

erobot
Copy link
Contributor

@erobot erobot commented Nov 25, 2022

Motivation

PartitionedProducerImpl do not close sub-producers properly when any sub-producer creation fails. Continuing to retry creating producer will eventually reach the maximum producer limit. It seems a regression caused by #54.

When sub-producer creation fails, state_ is set to Failed. PartitionedProducerImpl::closeAsync only do cleanup when state_==Ready and sub-producers do not close when state_==Failed.

State expectedState = Ready;
if (!state_.compare_exchange_strong(expectedState, Closing)) {
return;
}

Modifications

Close sub-producers when state != Closed.

Verifying this change

test code:

Client client("pulsar://localhost:6650");
Producer producer;
while (true) {
    Result result = client.createProducer("persistent://public/default/test", producer);
    if (result != ResultOk) {
        cerr << "Error creating producer: " << result << endl;
        std::this_thread::sleep_for(std::chrono::seconds(1));
        continue;
    }
    break;
}

test cmd:

pulsar-admin namespaces set-max-producers-per-topic -p 10 public/default
pulsar-admin topics create-partitioned-topic -n 2 public/default/test
pulsar-perf produce -r 1 -n 10 public/default/test-partition-0

test procedure:

  1. set namespace max producer limit to 10
  2. create partitoned topic public/default/test with 2 partitions
  3. use pulsar-perf to create 10 prodcuers on test-partition-0
  4. run test code to create producer for public/default/test, will fail because test-partition-0 already have 10 producer
    • without fix, test-partition-1 will have 10 producers because of sub-producer not close
    • with fix, test-partition-1 will have not more than 1 producer

Documentation

  • doc-required
    (Your PR needs to update docs and you will update later)

  • doc-not-needed
    bug-fix only

  • doc
    (Your PR contains doc changes)

  • doc-complete
    (Docs have been already added)

@shibd
Copy link
Member

shibd commented Nov 25, 2022

@erobot Hi, Can you add unit test based on the stepsverifying this change chapter?

Copy link
Contributor

@BewareMyPower BewareMyPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add the unit test to avoid the regression?

@erobot
Copy link
Contributor Author

erobot commented Nov 25, 2022

ok, I will fix the code style and add a unit test latter.

@BewareMyPower
Copy link
Contributor

without fix, test-partition-1 will have 10 producers because of sub-producer not close
with fix, test-partition-1 will have not more than 1 producer

The difference is not clear from the output.

with-fix.log

without-fix.log

Maybe a unit test would be better to tell the difference.

@BewareMyPower BewareMyPower added this to the 3.1.0 milestone Nov 26, 2022
lib/PartitionedProducerImpl.cc Outdated Show resolved Hide resolved
@BewareMyPower
Copy link
Contributor

BewareMyPower commented Nov 28, 2022

Just digged deeper into this issue. I think there is no problem with the current master.

If one internal producer failed to create, the partitioned producer would not be added into the ClientImpl or ClientConnection, see https://github.com/apache/pulsar-client-cpp/blob/96d870527d2a078645c4866e9d75b3f3b97193ba/lib/ClientImpl.cc#L202-204

Then the PartitionedProducerImpl will be destroyed and shutdown() will be called for the cleanup work (cancel the timer) in the destructor. Regarding the ProducerImpl objects, since they also failed to create, there is no need to call closeAsync on them.

  1. These producers are never reachable. The destructor could handle the cleanup work well.
  2. Since the creations of them are also failed, the broker side didn't register these producers, closing them makes no difference.

@erobot
Copy link
Contributor Author

erobot commented Nov 28, 2022

@BewareMyPower add a test to reproduce the problem

@BewareMyPower
Copy link
Contributor

I got it. It's caused when some sub-producers can be created successfully while other could fail.

Copy link
Contributor

@BewareMyPower BewareMyPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

tests/ProducerTest.cc Outdated Show resolved Hide resolved
tests/ProducerTest.cc Outdated Show resolved Hide resolved
tests/ProducerTest.cc Outdated Show resolved Hide resolved
@erobot erobot requested review from BewareMyPower and RobertIndie and removed request for BewareMyPower and RobertIndie November 28, 2022 06:12
@erobot erobot requested a review from RobertIndie November 28, 2022 07:11
@RobertIndie RobertIndie merged commit 85b1b53 into apache:main Nov 28, 2022
RobertIndie pushed a commit to RobertIndie/pulsar-client-cpp that referenced this pull request Nov 28, 2022
…properly (apache#125)

### Motivation

PartitionedProducerImpl do not close sub-producers properly when any sub-producer creation fails. Continuing to retry creating producer will eventually reach the maximum producer limit. It seems a regression caused by apache#54. 

When sub-producer creation fails, state_ is set to Failed. PartitionedProducerImpl::closeAsync only do cleanup when state_==Ready and sub-producers do not close when state_==Failed.

https://github.com/apache/pulsar-client-cpp/blob/f0268ecd29a6d0030b7d07379ec609884b4c14ff/lib/PartitionedProducerImpl.cc#L273-L276

### Modifications

Close sub-producers when state != Closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants