Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error 28 depending on time elapsed between multi_curl_exec() execution #4632

Closed
pascalsiegel opened this issue Nov 23, 2019 · 10 comments
Closed

Comments

@pascalsiegel
Copy link

Hello,

I am not sure this is a bug or just how it is supposed to work. I was expecting to be able to use multi curl and add a handle, execute it, and then wait 10 seconds, and then read the result.

I seems that "reading the response" needs to be done frequently (<0.1 seconds seems a minimum to be safe).

The issue I observe is that if the curl requested has fully executed before the second read or the connect timeout the response is "lost" and there is a error code 28.

I have tried different sleep times before the loop and in the loop and came to some kind of conclusions.

Is this a bug or simply how it works and the curl_multi_exec() needs to be executed very often to be sure it is executed before the curl has returned or in a minimum time after curl returned?

Please see code with my observation and best guesses below,
PS: apologies if this is not the right format/ way to post an issue (it's my first)

//--------------------
// Test Code Stats here
// some related reports:
// #619
// aws/aws-sdk-php#924

//PHP version: PHP 7.3.9-1+ubuntu16.04.1+deb.sury.org+1 (cli) (built: Sep 2 2019 12:54:04) ( NTS )
// cUrl version (php -r "print_r(curl_version());")
// [version_number] => 470784
// [age] => 3
// [features] => 968605
// [ssl_version_number] => 0
// [version] => 7.47.0
// [host] => x86_64-pc-linux-gnu
// [ssl_version] => OpenSSL/1.0.2g
// [libz_version] => 1.2.8

$execs = [];
$reads = [];
$getInfo = [];

$mch = curl_multi_init();
$csh = curl_init();

// setup curl and add the request
$curlOpts = [
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_TIMEOUT => 15,
CURLOPT_CONNECTTIMEOUT => 5,
CURLOPT_VERBOSE => true,
CURLOPT_URL => 'https://www.php.net', //'www.tf1.fr', //
];

curl_setopt_array($csh, $curlOpts);
$addChRetCode = curl_multi_add_handle($mch, $csh);

// execute curl
$time = number_format(microtime(true), 5, '.', '');
$status = curl_multi_exec($mch, $active);
$execs[$time] = ' Status: '.$status.' Active: '.$active;

//-------------------------------------------------------------------
// Observation:
// if no curl_multi_exec() is executed for a long time the timeout erro number 28 is triggered
// - something is timing out and it is like the reponse is lost?
// - this seems to happen when sleeping longuer than the total time it takes to fully execute the request
// - meaning something times out if the curl returned before the end of the sleep, and the response is lost
//
// There are different error messages for error 28. It seems to depend on the time elapse between the 1st, 2nd and 3rd curl_multi_exec()
// A) Error Num: 28 --> Message: Connection timed out after 6005 milliseconds
// B) Error Num: 28 --> Message: Operation timed out after 0 milliseconds with 0 out of 0 bytes received
// C) Error Num: 28 --> Message: Resolving timed out after 5001 milliseconds
// ==> this seems to be depending on what time the 2nd and 3rd curl_multi_exec() is execute in relation to the CURLOPT_CONNECTTIMEOUT
// -> message B): 1st exec ... 10 secs 2nd exec ->
// -> message A) or C): 1st exec ... (less than CURLOPT_CONNECTTIMEOUT) ... 2nd exec (time<5secs) .... (time>5secs) -> 3rd exec
//
// Conclusion/ best guess:
// - if a curl request returns before the second curl_multi_exec() the response is lost and the error 28 is triggered
// - curl_multi_exec() must be executed very often (<x millisecons?) to ensure curl multi is working?
// - there is some timout/ minimum or maximum time the curl returned is kept, after an error 28 is triggered
print 'sleep some seconds at (error 28 if longuer than total_time):'. microtime(true).PHP_EOL;
sleep(3);

//------------------------------------------
do {
$status = curl_multi_exec($mch, $active);
$time = number_format(microtime(true), 5, '.', '');
$execs[$time] = ' Status: '.$status.' Active: '.$active;

//-------------
// commenting the does not make any difference
// this comment/ issue does not seem to apply since the behavior seems the same when commenting 
// https://www.php.net/manual/en/function.curl-multi-select.php#123869
//if ($active) {
//    curl_multi_select($mch, 0.1);
//}

// read the returned curl
$info = curl_multi_info_read($mch);
if (false !== $info) {
    $reads[$time] = $info;
    $getInfo = curl_getinfo($csh);
    $errorMsg = 'Error Num: '. curl_errno($csh).' --> Message: '.curl_error($csh).PHP_EOL;
    break;
}
else {
    $reads[$time] = 'Info = false';
}
//------------------------------------------------------------------------
// when this is too long (1 or 2 seconds, or more than total time(?)) the timeout error 28 is also triggered
// Seems to be a similar behavior as when sleeping N seconds before entering the loop (seconds curl_multi_exec)
// This seems a bit random but it starts failing when the request finished before the end of the usleep.
// sleep 0.5 seconds  : => OK
// sleep 1 second  : Error Num: 28 --> Message: Operation timed out after 11434 milliseconds with 92383 out of 446394 bytes received
// sleep 1.5 second: Error Num: 28 --> Message: Operation timed out after 0 milliseconds with 0 out of 0 bytes received
// sleep 1 second  : Error Num: 28 --> Message: Operation timed out after 0 milliseconds with 0 out of 0 bytes received
usleep(1500000);

} while ($active && $status == CURLM_OK);

//------------------------------------------
// result
print_r($execs);
print_r($reads);
print PHP_EOL.'Get Info in loop and after loop: '.PHP_EOL.$errorMsg.PHP_EOL;
print_r($getInfo);
print_r(curl_getinfo($csh));

//close the handles
curl_multi_remove_handle($mch, $csh);
curl_multi_close($mch);

// End of test code
//------------------

Best,
Pascal

@bagder
Copy link
Member

bagder commented Nov 23, 2019

First, you're using the CURL/PHP binding and not the actual curl or libcurl directly so you should rather file bugs on this module to the PHP team. This is not the home of the CURL/PHP binding.

But... curl_multi_exec() (which is the PHP version that more or less directly maps to the underlying curl_multi_perform() function) performs only as much "network operations" as it can in one go without getting blocked. It should be called again when the curl timeout triggers or when there's network activity on one of the sockets that curl uses.

So if you call it first, then sleep for a while, there will most likely be network activity happening that isn't acted upon because your code sleeps instead of asking curl to do that. If you then at the same time reduces the maximum allowed time you of course risk that you sleep all into the end time you've asked curl to give up on.

You're asking for this. This is not a curl bug. Not even a CURL/PHP bug.

@bagder bagder closed this as completed Nov 23, 2019
@pascalsiegel
Copy link
Author

pascalsiegel commented Nov 25, 2019

Thanks for getting back to me despite it being more a PHP related question.

I did some studying and read your documentation and the PHP code that is mapping the multi curl functions (they are indeed pretty close). Also it seems clearly not a bug but simply how it is working.

I summarized my understanding on how it is working below. Could you please let me know if it is correct?

Could you also clarify what is meant by a #pending internal timeout# in the documentation concerning multi curl, I am so far assuming that timeout means the CURLOPT_xx settings, but wonder if they are others I should be aware of?
On page: https://curl.haxx.se/libcurl/c/curl_multi_wait.html, in “if the multi handle has a #pending internal timeout#
On page: https://curl.haxx.se/libcurl/c/curl_multi_perform.html , in if the multi handle has a #pending internal timeout#

So, my understanding is:
The problem I encounter is the time (TOW) spent outside the curl_multi_wait() while curl transfers are still running (curl_multi_wait() = curl_multi_select() in PHP for version, libcurl >= 7.28.0).

Assuming the TOW time is 50ms and a curl transfer finishes or times out during these 50ms then I get the error code 28 (timeout being CURLOPT_TIMEOUT, CURLOPT_CONNECTTIMEOUT, others?).
The error 28 is because the script was not in the curl_multi_wait() function (blocking mode). Meaning not waiting for a timeout or a curl easy to finish, whatever happens first.
Any time not spent in the curl_multi_wait() is a risk of error 28 even if the transfer ended correctly with a code 0.
This means that while at least one curl easy is running, doing anything from a curl_multi_info_read(), curl_multi_getcontent(), etc. can potentially generate an error 28 if a curl transfer (TX) ends or times out during the 50ms TOW time (Note: even doing an curl_multi_add_handle() and curl_multi_exec() is a risk, and to an extreme the few micro seconds spend in the loop after curl_multi_wait() on the example on page https://curl.haxx.se/libcurl/c/curl_multi_wait.html is a risk of error 28 and losing the transferred data.

The final consequence of any time spent outside of curl_multi_wait() is to not get the result of the transfer TX even if it succeeded.

Best,
Pascal

@bagder
Copy link
Member

bagder commented Nov 25, 2019

First, "internal timeouts" refer to a timeout handled internally. They are plenty and the timeout values you can set with the API will be included among them but will not be the only ones possible. They're internal and thus not externally visible. The application will just know when it should call the API again so that libcurl can continue its business as it wants.

Assuming the TOW time is 50ms and a curl transfer finishes or times out during these 50ms then I get the error code 28

Generally speaking, this is not true at all. In most times during a transfer's life time you can wait a long time before you call libcurl again and the only thing you'll do is get a slower transfer. In fact, if you can reproduce your hypothesis that not calling curl_multi_wait() within 50ms causes a problem from a plain libcurl program against a public URL then I'd be most interested in that recipe (and I'm afraid it would need to not use PHP for me to be able to do anything fun with it). I can't even think of a plausible situation how that can happen.

@pascalsiegel
Copy link
Author

pascalsiegel commented Nov 28, 2019

Hello again,

Sorry, this is going a bit long and not absolutely clear. I put my understanding and conclusion at the end, your insight would be appreciated.
I was a bit surprised about your response concerning the 50ms since I had observed it a number of times, but it was random. “Generally speaking, this is not true at all. In most times during a transfer's life time you can wait a long time before you call libcurl again and the only thing you'll do is get a slower transfer.”
So, I have created a test environment in order to be able to replicate what I have observed, but with less randomness.
This is an apache server that responds with a very short string in the body and is responding normally within a few millisecond (2 to 5ms when not sending to many requests). My PHP script and the apache server are on the same host (an Ubuntu server).
The goal was to guaranty a consistent request time and have it fast (<5ms) so that the time spent after a curl_multi_wait() becomes relevant, even if very short. This can feel like creating an edge situation but is not since if you send 20 requests to the same server, there is a chance they all come back within a few milliseconds, or within a 100ms.
At time 0 (T0):
The script is creating, adding and executing 10 requests, 1 every 10 milliseconds, the apache server script sleeps 1 second before replying. The request have a 1 second and 20ms, this is in reality 20ms since the server sleeps 1 second).
So each of should return in 1 second (server sleep duration) at an interval of 10ms. Meaning a request will return about every 10ms plus/minus the time for apache/ network, etc. which is varying a bit but mostly below 10ms.
At T1, a bit more than 100ms later the main loop starts (see code below), and loops until all requests are done. Since the request “are sleeping” on the server, they will come back at the earlies 1 second after the first request was added plus a few milliseconds, so not much happens during the first loops. This allows to get the server in a “quiet mode”.
I use the times from the apache log to get the response time on the server side (since when request times out I don’t really know if it really timed out or not, the “total_time” shown is set at some other time by I suppose curl_multi_perform()).

The script is longer than the below but it the rest is mostly about building the tables you can see in the attached files. In the files I put comments for when a request fails due do the time out, this is my best guess/ conclusion.
Have a look at the attached files, a bit difficult to read due to too much information but I put comments about when and why a request trigger error 28, and should not.
So, my observation/ conclusions.

  1. The wait (curl_multi_wait()) does no really matter, it is mostly a “sleep time” used to not overload the CPU by looping to fast. It can be removed and replaced by a very short sleep, <1ms would likely work most of the time. It safe to use but does not remove the risk of false error 28, the time after wait and before the next perform still needs to be very short.
  2. The time that really matter is the time spent between curl_multi_perform().
  3. Request are marked as timed out (error 28) when they return during the time in between 2 curl_multi_perform().
  4. The shortest false error 28 (in the attache files) is with 10ms between 2 performs, way less than the 50ms you mention.
  5. It is very possible that even just 1ms between 2 curl_multi_perform() triggers a timeout().It would be bad luck, but technically it is.
  6. The busier the server running the libcurl, the more likely false timeouts (error 28) will happen.
  7. I could be a PHP issue, but not likely since PHP does not do mush on concerning curl_multi_exec (wrapping of curl_multi_perform()). The reason can be because PHP is not multi-thread in the OS sense of it, and therefore all processing is sequential.
  8. Sleeping 1ms after a wait is likely to make things worse and generate more “false” timeouts .

Best,
Pascal

failed_wait_1.txt
failed_wait_1.txt
failed_wait_2.txt
failed_wait_1.txt
failed_wait_2.txt

-------------------- Main CODE: -------------------
// request creation

$nbCurls = 10;
$refTime = time();
$addStart = microtime(true);
for ($nbc=1; $nbc <=$nbCurls; $nbc++) {
    $csh = curl_init();
    $chIdx = 'c'.explode('#',(string)$csh)[1];
    $chAddTimes[$chIdx] = microtime(true);
    $expecteResp[$chIdx] = $chAddTimes[$chIdx] + 1.010;
    $timeOutAt[$chIdx] = $chAddTimes[$chIdx] + $curlOpts[CURLOPT_TIMEOUT_MS]/1000;
    
    $curlOpts[CURLOPT_URL] = 'http://192.168.1.212:8022/?'.http_build_query(['chIdx' => $chIdx, 'addTime' => $chAddTimes[$chIdx]]);
    curl_setopt_array($csh, $curlOpts);
    
     // add and execute curl
    curl_multi_add_handle($mch, $csh);
    curl_multi_exec($mch, $active);
    
    // wait before sending the next request
    usleep(10000);
}

// MAIN LOOP
do {
    $l++;
    $start = microtime(true);
    $lpIdx = (string)$start;
    $lp[$lpIdx]['lpNum'] = $l; 
    $lp[$lpIdx]['start_T'] = $start;
    //$lp[$lpIdx]['lpIdx'] = $lpIdx;
    
    //-----
    curl_multi_exec($mch, $active);
    $lp[$lpIdx]['active'] = $active;
    $lp[$lpIdx]['aExec_T'] = microtime(true);
    
    //-----
    curl_multi_select($mch, $waitTimeOut);
    $lp[$lpIdx]['aWait_T'] = microtime(true);
   
    //--------------------------------
    $nbReads = 0;
    if ($active >= 0) {
        do {
            //-----
            $info = curl_multi_info_read($mch);
            if ($info !== false) {
                $nbReads++;
                $chIdx = 'c'.explode('#',(string)$info['handle'])[1];
                $infoRead[$chIdx] = [
                    'lpIdx'     => $lpIdx,
                    'info'      => $info,
                    'readTime'  => microtime(true),
                    'lpNum'     => $l,
                    'nbReads'   => $nbReads,
                ];
            }
        } while ($info != false);
    }
    //--------------------------------
    $lp[$lpIdx]['nbReads'] = $nbReads;
    $lp[$lpIdx]['aRead_T'] = microtime(true);
    
    //... Do some stuff ...
    usleep($microSleep);
    $lp[$lpIdx]['end_T'] = microtime(true);
} while ($active>0);

@bagder
Copy link
Member

bagder commented Nov 28, 2019

You're not adding new information so I can only repeat what I've already said.

What timeout(s) are you telling libcurl to use?

And also, I wrote up a C example that has a 500 millisecond wait in an attempt to reproduce. It works just fine for me:

#include <stdio.h>
#include <string.h>

/* somewhat unix-specific */
#include <sys/time.h>
#include <unistd.h>

/* curl stuff */
#include <curl/curl.h>

/*
 * Simply download two HTTP files!
 */
int main(void)
{
  CURL *http_handle;
  CURL *http_handle2;
  CURLM *multi_handle;

  int still_running = 0; /* keep number of running handles */

  http_handle = curl_easy_init();
  http_handle2 = curl_easy_init();

  /* set options */
  curl_easy_setopt(http_handle, CURLOPT_URL, "http://localhost/512M");

  /* set options */
  curl_easy_setopt(http_handle2, CURLOPT_URL, "http://localhost/512M");

  /* init a multi stack */
  multi_handle = curl_multi_init();

  /* add the individual transfers */
  curl_multi_add_handle(multi_handle, http_handle);
  curl_multi_add_handle(multi_handle, http_handle2);

  /* we start some action by calling perform right away */
  curl_multi_perform(multi_handle, &still_running);

  do {
    CURLMcode mc; /* curl_multi_poll() return code */
    int numfds;

    /* we start some action by calling perform right away */
    mc = curl_multi_perform(multi_handle, &still_running);

    if(still_running)
      /* wait for activity, timeout or "nothing" */
      mc = curl_multi_wait(multi_handle, NULL, 0, 1000, &numfds);

    if(mc != CURLM_OK) {
      fprintf(stderr, "curl_multi_wait() failed, code %d.\n", mc);
      break;
    }

    usleep(500000); /* 500 millisecond useless wait */

  } while(still_running);

  curl_multi_cleanup(multi_handle);

  curl_easy_cleanup(http_handle);
  curl_easy_cleanup(http_handle2);

  return 0;
}

@pascalsiegel
Copy link
Author

Hello and thanks for taking the time,
My C is a few decades outdated but I think I understand your script and finally understand was is happening (the attached file has a new example).
For the false error 28 to happen the request needs to return during a sleep time AND the time of the timeout needs to be during that sleep time.
This is an edge case and setting long timeouts it is likely never going to trigger the error 28, except wih very bad luck.
Her I am guessing. The issue is not coming the curl_multi_wait(). But from the curl_multi_perform.
The perform is testing if the request timeout time has passed, if yes, it marks the request a timeout (makes sense! :) ), and the total_time is also set during the perfom.
There is one exception where the error 28 is wrong, which is when the request returns AND the time out CURLOPT_TIMEOUT are both happening during the same “sleep” time (the edge case and bad luck).
Is this correct?
Assuming the above is correct, this edge case can be easily avoided by setting a long CURLOPT_TIMEOUT, no problem with that. And that the total_time is a bit wrong because of it is set in the perform is fine as well even if it will be wrong even if there is no error 28.
My concern, and question: Are there other timeouts (the internal ones) that would also trigger such error or similar ones if the time spent in between 2 perform is too long.
Concerning my test and your script.
In order to test, the error code 28 needs to be triggered during the 500ms sleep time. I have done that be setting the CURLOPT_TIMEOUT_MS = 1020, which leads to a 20ms timeframe because the server sleep 1000ms.
I use the request CURLOPT_TIMEOUT_MS as a trick to trigger the error 28. I am not sure anymore but. In my first post the usleep(1500000), the timeouts are 5 or 15 seconds, but the response is back in less than the 1.5 second sleep, this was triggering error 28. I had also observer that from time to time I was getting the first 7500 bytes of the response. My conclusion was that any time spent after the wait (TOW) was an issue, and it seems to be if a response is arriving during that TOW time. Now it think it is not the time after the curl_multi_wait, but in between 2 curl_multi_perform.
The difficulty in testing was to get a request to return within the short 4ms between 2 curl_multi_perform and the timout to also be in these 4ms. The point was to show that any time spent “sleeping/ some processing” was a potential issue and could generate the error 28.
So, in your code, to trigger the false error 28 the request needs to timeout during the 500ms sleep. CURLOPT_TIMEOUT_MS= 200 should do it. ( with the assumption that the localhost response is < 500ms).
Both will fail with error 28 because of the their 200ms timeout.
Best,
Pascal

failed_data_2.txt

@bagder
Copy link
Member

bagder commented Nov 29, 2019

to trigger the false error 28 the request needs to timeout during the 500ms sleep. CURLOPT_TIMEOUT_MS= 200 should do it.

That's not a "false error". If you tell curl that it must be done with the request within 200 milliseconds and it detects that it has spent more than so (even if that happened because the application - you - sort of did it on purpose), it should and will return a timeout as soon as it detects that fact. Any other behavior would be a bug.

@pascalsiegel
Copy link
Author

Hello and thanks for taking the time,
My C is a few decades outdated but I think I understand your script and finally understand was is happening (the attached file has a new example).
For the false error 28 to happen the request needs to return during a sleep time AND the time of the timeout needs to be during that sleep time.
This is an edge case and setting long timeouts it is likely never going to trigger the error 28, except wih very bad luck.
Her I am guessing. The issue is not coming the curl_multi_wait(). But from the curl_multi_perform.
The perform is testing if the request timeout time has passed, if yes, it marks the request a timeout (makes sense! :) ), and the total_time is also set during the perfom.
There is one exception where the error 28 is wrong, which is when the request returns AND the time out CURLOPT_TIMEOUT are both happening during the same “sleep” time (the edge case and bad luck).
Is this correct?
Assuming the above is correct, this edge case can be easily avoided by setting a long CURLOPT_TIMEOUT, no problem with that. And that the total_time is a bit wrong because of it is set in the perform is fine as well even if it will be wrong even if there is no error 28.
My concern, and question: Are there other timeouts (the internal ones) that would also trigger such error or similar ones if the time spent in between 2 perform is too long.
Concerning my test and your script.
In order to test, the error code 28 needs to be triggered during the 500ms sleep time. I have done that be setting the CURLOPT_TIMEOUT_MS = 1020, which leads to a 20ms timeframe because the server sleep 1000ms.
I use the request CURLOPT_TIMEOUT_MS as a trick to trigger the error 28. I am not sure anymore but. In my first post the usleep(1500000), the timeouts are 5 or 15 seconds, but the response is back in less than the 1.5 second sleep, this was triggering error 28. I had also observer that from time to time I was getting the first 7500 bytes of the response. My conclusion was that any time spent after the wait (TOW) was an issue, and it seems to be if a response is arriving during that TOW time. Now it think it is not the time after the curl_multi_wait, but in between 2 curl_multi_perform.
The difficulty in testing was to get a request to return within the short 4ms between 2 curl_multi_perform and the timout to also be in these 4ms. The point was to show that any time spent “sleeping/ some processing” was a potential issue and could generate the error 28.
So, in your code, to trigger the false error 28 the request needs to timeout during the 500ms sleep. CURLOPT_TIMEOUT_MS= 200 should do it. ( with the assumption that the localhost response is < 500ms).
Both will fail with error 28 because of the their 200ms timeout.
Best,
Pascal

Uploading failed_data_2.txt…

@pascalsiegel
Copy link
Author

That's not a "false error". If you tell curl that it must be done with the request within 200 milliseconds and it detects that it has spent more than so (even if that happened because the application - you - sort of did it on purpose), it should and will return a timeout as soon as it detects that fact. Any other behavior would be a bug.

I completely agree with it is not a bug, except for the edge case. Creating the error on purpose is only a way to demonstrate it.
The reason is that in reality I know the request has returned before the 500ms, but curl is marking it as timed out.
I have tried to read the libcurl to check on the below, but could not find where the timeout was check/triggered, can you point to the code?
I am left guessing again and I imaging libcurl is checking
A) if the transfer has complete (not sure how this is done but guessing by checking the number of bytes received)
B) if the time out has been reach.
So,
if A) happens before B), there would not be an error 28 (transfer is completed),
if B) happens before A), then the error 28 is triggered.

I don’t have any issue with this A) happening before or after B) since it only concerns the edge case.
I am trying to understand and know if there are other timeouts (internal) that could trigger errors if the time between 2 perform is too long. Something like flushing the content from buffers for example?

Best,
Pascal

@bagder
Copy link
Member

bagder commented Nov 30, 2019

it is not a bug, except for the edge case

I disagree. I think it works as intended even for that case, which I don't think is an edge case but a normal case: the timeout time was reached in between curl-invokes.

The transfer was not complete from curl's point of view, which is why it returns an error. There's more work to do but the timeout limit is reached.

If there was nothing more to do for curl, then it would already had completed that transfer.

The transfer might be complete from the server's point of view, but that's certainly not the same thing.

Now I'm going to stop responding to this issue.

@lock lock bot locked as resolved and limited conversation to collaborators Feb 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

2 participants