-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for HTTP proxies #26
Comments
To answer my question, the way to support a HTTP proxy is: $http_context = stream_context_create([
'http' => [
'method' => 'GET',
'user_agent' => 'MyAgent/1.0 +url',
'proxy' => 'my-proxy.example.com:80',
'header' => [],
]]);
$htmldoc = hQuery::fromFile( $scrape_url, false, $http_context ); See: GitHub: nfreear/school-closure .. Do you want to add an Thanks, Nick |
Hi @nfreear Thanks for the info! Probably I should add more info to README to make things clear for new users. |
Hello! |
Try this: $auth = base64_encode('LOGIN:PASSWORD');
$http_context = stream_context_create([
'http' => [
'method' => 'GET',
'user_agent' => 'MyAgent/1.0 +url',
'proxy' => 'my-proxy.example.com:80',
'request_fulluri' => true,
'header' => "Proxy-Authorization: Basic $auth",
]]);
$htmldoc = hQuery::fromFile( $scrape_url, false, $http_context ); |
How can I use a proxy with fromURL? thanks |
Hi Jochen, I think at the moment you can't use a proxy with What is your use-case? Ta, Nick |
Hey Nick, If I use
Jochen |
The main focus of this library is parsing of big HTML documents, not fetching them. After you've managed to set up the HTTP request with any PSR-7 compliant library, you can either feed the Here is a theoretical example (I did not test it!): composer require php-http/guzzle6-adapter php-http/message php-http/discovery use duzun\hQuery;
use Http\Discovery\MessageFactoryDiscovery;
use Http\Adapter\Guzzle6\Client as GuzzleAdapter;
$config = [
'timeout' => 7,
'proxy' => [
'http' => 'tcp://localhost:8125', // Use this proxy with "http"
'https' => 'tcp://localhost:9124', // Use this proxy with "https",
'no' => ['.mit.edu', 'foo.com'] // Don't use a proxy with these
]
];
$client = GuzzleAdapter::createWithConfig($config);
$messageFactory = MessageFactoryDiscovery::find();
$request = $messageFactory->createRequest(
'GET',
'http://example.com/someDoc.html',
['Accept' => 'text/html,application/xhtml+xml;q=0.9,*/*;q=0.8']
);
$response = $client->sendRequest($request);
$doc = hQuery::fromHTML($response, $request->getUri()); Though, what you are trying to do sounds a little too aggressive. Bots should be nice and not overwhelm servers with too many requests. Try to be nice and don't forget to add |
Hey duzun, But that line: Can you help me out again? thanks |
Hey @gmmedia, The error you are getting is weird, because I've added an example of using guzzle6-adapter with You could either figure out what is the issue, or just switch to the second option: extract body from response object as a string and use that with It would be nice to know any details of your issue though. |
Hello @duzun! That error came because I did not include hQuery with that line: It is working now, but the new code is not following a 302 redirect. I also miss the caching function now. How can I implement it again? thanks |
If you don't like this approach (HTTPlug), just use any other library for fetching and caching HTML documents. Maybe guzzel directly.
|
Do you have an idea why the redirects not working? |
Yes, I've mentioned it above:
You have to use RedirectPlugin with guzzle6-adapter. Here is an alternative example, using just guzzle client (no HTTPlug): guzzle example. |
I think the guzzle client is the easier version for me. Thank you, I will try that now. Why do you load hquery not from your vendor directory with autoload.php? |
Also, |
I am asking, because every other classes working fine for me with composer and autoload. |
You are right, I just forgot to add the Try to Thank you for catching this issue! :-) |
Hi @duzun,
Great work on the library. Are you open to adding support for HTTP proxies to it?
I've done a GIst test PHP which demos what's involved (currently without proxy authentication).
Alternatively, is there a way to use
hQuery::fromFile()
withcreate_stream_context()
?I'll investigate, and update this ticket ;).
Thanks,
Nick
The text was updated successfully, but these errors were encountered: