-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#1247 Fix race condition in HttpProtocol while having multiple proxy settings #1250
Conversation
e3b31e8
to
d5c445f
Compare
This is a substantial change that could have an impact on how things work in SC. It needs careful reviewing, annoyingly I don't have much time to do that at the moment but I hope I will be able to soon. Reviews from other committers would be welcome |
My gut reaction is that adding even more shared state to the classes is not a good way to go. The more I look at this the more I think there is a major design flaw. The HttpProtocol classes should not be reused for HTTP calls that have different configs. There should be separate HttpProtocol class instances created for each different config. In #1247, there are 2 proxies described, HTTP calls for those 2 proxies should not be made with the same HTTPProtocol instance. |
The solutions applied in okhttp and httpclient classes seem very different. It would greatly simply review to have 2 different PRs that can be reviewed independently. |
In HttpProtocol implementation, the client builder was singleton and may be accessed and modified by different threads at same time. The result is that a wrong proxy will be used or a wrong proxy auth will be configured. To fix it, create a local builder insteand of having a field-level builder. Fixes apache#1247
Sure, I'll separate the httpclient in a separated PR.
Actually, those newly added fields are not shared state. They are just the http options retrieved from the conf. The easiest and most readable way is to just have a local field conf, and we always build the http client from the conf. But that would introduce unnecessary map.get operations as the conf should keep unchanged when it was loaded. So I decided to have these option fields to be loaded during the What do you think?
I think it's might be a common case to round-robin a large proxy pool (may have thousands or more ips). So it makes sense to me to have a single HttpProtocol, and select a proxy from ProxyManager whenever it was called. If we'd like to have multiple HttpProtocol instances for each proxy, it means that we may round-robin the HttpProtocol in ProtocolFactory. And I'm not sure if it's easy to implement it. |
d5c445f
to
bf49afb
Compare
Pushed the update to remove the change of httpclient HttpProtocol. |
Actually, I am not using proxies for my setup, so I am unable to test it. Would it be possible to add unit tests (leveraging test containers) to spin up a proxy-based setup to demonstrate the correctness of this change? |
I like this approach. We already have the ability to have more than 1 instance of each protocol, ideally we would be able to configure them differently |
Closing due to a lack of activity. Feel free to reopen if necessary |
In HttpProtocol implementation, the client builder was singleton and may be accessed and modified by different threads at same time. The result is that a wrong proxy will be used or a wrong proxy auth will be configured.
To fix it, create a local builder insteand of having a field-level builder.
Fixes #1247
Thank you for contributing to Apache StormCrawler.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a issue associated with this PR? Is it referenced in the commit message?
Does your PR title start with
#XXXX
whereXXXX
is the issue number you are trying to resolve?Has your PR been rebased against the latest commit within the target branch (typically main)?
Is your initial contribution a single, squashed commit?
Is the code properly formatted with
mvn git-code-format:format-code -Dgcf.globPattern="**/*" -Dskip.format.code=false
?For code changes:
mvn clean verify
?Note:
Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible.