-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OBOE (off by one error) when determining if all chunks have been read #55
Comments
* mhoglan-nil-key-request: add nil path test Added guard protections for requests / responses Add guard to handle if nil path is requested Add guard to handle if content-length from response is -1 which represents that the response is chunked transfer encoding or EOF close response change naked URL to a link from "IAM role" Correct usage of help command. Correct version number. Handle cases where etag is missing gracefully.
Regarding the nextChunk method, if the quit channel is closed due to an error, it will return that error. That is the designed way for this method to exit when a problem occurs. |
I was thinking more of a notification that no more chunks are coming so the nextChunk method doesn't infinitely select. It still needs to work whatever chunks are present, but if no chunks exist and the workers are done, then quit. With the guards it should be less likely it gets into that state, so the benefit is lower now. Maybe it is as simple as adding a channel that the worker() function closes when it is done iterating getting chunks. I was looking that I could wrap the reader and timeout at the application level (if my read takes so long or if I noticed after so many checks the read has not progressed) then call reader.close() which will call the quit channel and everything exit out. |
The worker() method can't do that since there are more than one of them running and they only send on the readCh when a chunk is complete. Only the Read method calls nextChunk() and it also is the only method that increments chunkID, so unless the Read method is incorrect, nextChunk() can only be called nextChunk() when there are chunks remaining to be received from readCh. I'm open to a better solution but since all this is internal, the scenario can not occur unless the Read implementation is incorrect. |
It would have to be a separate goroutine that has a CountDownLatch (not sure if this exists in vanilla golang) on the workers finishing, and when triggered then signal that all workers are done. Agree that the chances of it happening now are much lower. The guard check for all chunks finished but not bytes do not equal will catch the majority of this. I'll think about it some more and get a better concrete example, I know it would help to see the pseudocode or a more concrete list of actions, and as I do that I probably will come to the same understanding you are at since you know the code / flow better. I am working on a plan that could utilize the WriterAt type and allow even more concurrency to occur, and that might inherently do this also. |
I'm closing this issue as the original bug is resolved. Please feel free to open another issue for the topic discussed here. |
The code for determining if all chunks have been read has an OBOE that causes not all bytes to be read if the number of bytes in the chunk is 1 byte more than the amount of bytes read after the
copy
This will occur if the chunk size is a multiple of the default byte buffer size (32KiB, 32 * 1024) that
io.Copy
uses + 1 byte; When the copy exits,g.cIdx
is updated with the amount bytes read. This makes the variable a count which is 1-based (opposed to 0-based if it was an index into the buffer); The chunk size is a length which is 1-based, and there is no need to offset by 1.When this is encountered, the goroutines will end up in an infinite select situation as the next iteration of the loop will check
if g.bytesRead == g.contentLen
which it does not because there is 1 byte remaining in the chunk, and it will proceed to callg.nextChunk()
since theg.rChunk
was cleared out earlier. Now the goroutine will block the select innextChunk()
and there are no more chunks.goroutine trace from pprof
Can be reproduced with any file that is 1 byte larger than a multiple of 32KiB, such as in my case 1081345 (32 * 1024 * 33 + 1); Default PartSize of 20MiB
The last couple iterations of the loop with some debugs
Results in infinite select loop. File on disk is missing 1 byte (the last byte of file).
Can also be reproduced by using any PartSize that is a multiple of 32KiB + 1; Example of a file that succeeded with default config of 20MiB PartSize, but fails with a PartSize of 131073 (32 * 1024 * 4 + 1); Also shows a multipart download which above example did not.
Results in infinite select loop. File on disk will be missing 1 byte for every chunk.
Pull request incoming which addresses the issue and adds a couple guards
For more robustness, there could be timeout on the select in the
getNextChunk
, If for any reason the routine gets into here and there are no more chunks, it will block forever. Cannot rely on the underlying HTTP connection being gone and triggering a close because the data has been read into memory. I did think about that theworker()
function should close theg.readCh
but that would not have it break out of the select (it would if it was a range on the select); Have not thought this fully out, but feel something can be done to signal that no more chunks will be arriving on the channel because the workers are gone.The text was updated successfully, but these errors were encountered: