-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong content while downloading files >200Mb #468
Comments
AFAIK dropbear does not have SFTP, so some external SFTP is used. Either OpenSSH, or some other implementation. Do you have more information on that server? Since it occurs apparently only with that particular server my first guess would be that this is a server problem. Though it is not impossible that it might be some bug in Apache MINA SSHD that is triggered only in very specific circumstances that just happen to occur only with this server... Tweaking parameters or the buffer size should not be necessary; looks to me that this tweaking just happens to avoid the precise circumstances that trigger this bug, wherever it might be, in your case. Absent a reproducer that we could live debug, this would need a full trace log, plus information about the file content (at what offsets exactly do duplications occur? How long are the duplications? Expected/actual file size?) The trace log should have information about all blocks received, and their offsets and lengths. But that log might be huge. From code inspection I see only one possibility for something like this to occur: if the server returns more data than the client requested for a chunk. But that should never happen; such a server would violate the SFTP protocol. |
logs-ok-ko.zip One major fact I forgot to mention, using linux sftp or paramiko , downloaded file is as expected. Still I also do think it seems related to server behavior, now given native sftp is working fine, I do not know what to ask to that provider. In order to help investigation, as I could not yet find a reproducer using for instance SftpTransferTest, I have attached logs:
I will try to find the precise offset, I try to remember it happen in the middle of the file, bunch of lines are duplicated. Thanks again for your support on that issue puzzling me! |
Hi, Benoit |
I need the log with log level TRACE. Only that will log precise offsets and lengths of data received. I also will need byte offsets. Line numbers won't help me. If you're worried that trace logging might reveal sensitive data (buffer contents), feel free to tell me via private e-mail (on the address I use in the git commits) how and where to fetch the logs. They'll probably be too large to attach here anyway. |
offset-ko.zip |
Something strange is going on at the end; the last request (id=6346) should be sent for offset 204423955 + 1842, not +32755. That looks like a bug, but it doesn't explain duplication inside the file. At what byte offsets do these duplications occur? Given that log I'd expect a file of 204425799 bytes, which is what you wrote it should be. Are you sure that is a log of a case gone wrong? If so, the problem cannot be in receiving the data but somewhere in later processing in the stream. |
Hello,
id=744, offset=20963200, length=32755 I do not know if this is expected, please notice below the "- len=41084" (19 times in the file) 2024-02-23 14:13:33,816 TRACE o.a.s.s.c.i.SftpInputStreamAsync [main] pollBuffer(SftpInputStreamAsync[ClientSessionImpl[[email protected]/64.233.167.130:19321]][./market_reference_data_2024-02-21.csv]) response=103 for ack=SftpAckData[id=744, offset=20963200, length=32755] - len=41084 Whereas most of the time we have "- len=32764" (13k times): 2024-02-23 14:13:33,799 TRACE o.a.s.s.c.i.SftpInputStreamAsync [main] pollBuffer(SftpInputStreamAsync[ClientSessionImpl[[email protected]/64.233.167.130:22]][./data.csv]) response=103 for ack=SftpAckData[id=743, offset=20930445, length=32755] - len=32764
|
Why did I miss that? Anyway: no need for further logs. You found the reason. That's exactly what I wrote above: from code inspection, this can happen if the server returns more data than requested, which it never must do. If it does, the server is broken. Seems to happen exactly every 640th request. But we can do something about this in the client side. |
That is great if this is indeed the reason! Really strange though why the server would do that on a regular pattern... Something in the client would be perfect, in this case you should raise an exception or skip the extra data sent ? 2 questions:
|
I cannot answer your questions. I don't know what other buffer sizes these other SFTP clients used, or why it occurs every 640 requests. Perhaps the other clients might have worked because they used different buffer sizes? As for solving it client-side, I'm thinking of ignoring the extra data, and logging a warning. But I'll also have to double-check that this is not some strange bug in the lower layers in Apache MINA SSHD. Like, the server sending data correctly, but in the client somehow assembling packets to SFTP buffers goes wrong. Don't know yet what exactly is up. And yes, it is strange it occurs so regularly. |
41084 - 32764 = 8320 204500679 - 204425799 = 74880 74880 / 8320 = 9 The offset_ko.txt log seems to contain the result of three consecutive downloads? The first two failed or otherwise cut short? The last one (starting at 2024-02-23 14:35:26,153) has 9 times length=41084, so that matches up. And 8320 / 640 = 13. Mighty strange. Especially since 32755 + 13 = 32768 = 32k. Coincidence? If you can tell or find out, I'd still very much like to know what SFTP server is being used with that dropbear. |
Can you please verify the locations (byte offsets) of the duplications. From your log, I would expect the first such duplication to be at offset 20995955: 8320 bytes duplicated, i.e., followed by the same 8320 bytes beginning at offset 21004275. I don't see anything in our code that would cause the observed behavior. So my current assumption is still that this is some strange server-side bug. What worries me, though: according to your logs, the server is partnerupload.google.com. And they use dropbear and a buggy SFTP server? That sounds somewhat unlikely. With dropbear one uses normally either the OpenSSH SFTP server or the Green End server. I see nothing in their sources that would explain the observed behavior. Of course it's possible that the server uses something else still. |
offset.zip
I still not have approval to share data or any kind of reproducer to ease investigation, thanks again for your analysis! |
Hello @tomaswolf , Using that setting file is not corrupted. Do you know what is the rational behind this above "- 13"? Hope this helps your investigation. |
I've been wondering about -13 myself. The -13 aligns the SFTP buffer size such that the whole SSH_FXP_DATA SFTP message fits a channel packet exactly. (There's 13 bytes of overhead in the SFTP protocol.) However, I wonder if that is correct. The SSH connection layer adds another 9 bytes to this. If the intent was to ensure an SFTP data message is not split over two SSH channel packets, it seems to me the value should be -22. |
Hello @tomaswolf , Later that day, at 15:28:50.850 UTC, the READ_FILE operation transferred 32755 (difference of 13 bytes). |
At least their server logs confirm that the server did send too much data overall. I wonder if the server logs also show the 9 SSH_FXP_DATA replies of 41075 bytes every 640th request. (41084 bytes is the length including the SSH_FXP_DATA marker, the SFTP request ID, and the real data length.) |
Some SFTP servers appear to return sometimes more data than requested. If that happened, it was possible that a downloaded file would be corrupted (size larger than expected, and some duplicated data inside the file). By default, throw an exception if an SFTP server returns more data than requested. If property SftpModuleProperties.TOLERATE_EXCESS_DATA is set to true on the session or on the channel, discard such excess data and log a warning. Avoid code duplication; unify SSH_FXP_READ response handling in AbstractSftpClient.
I've merged a change that tightens SFTP read requests. If a server sends back more data than requested, an exception is thrown. If you set |
Version
2.12.1
Bug description
Hello,
We are using Mina to download and upload files to various sftp servers.
We have noticed that while fetching some files with a size >200 Mb to a precise sftp server, content of the file is bigger than what is on the server.
For a text file we can see unexpected duplicated data inside.
By narrowing the problem, and using SftpClient.read with a bigger buffersize than the default value, downloaded file is the same as the source:
return getClient().read(resolveRemotePathValue(remoteFilename));
->
return getClient().read(resolveRemotePathValue(remoteFilename), SftpClient.IO_BUFFER_SIZE * 4);
We would like to understand how to properly configure above buffersize or any parameter to have expected behavior, so far we use most of the default settings.
Note that we could not reproduce yet such issue on other sftp servers, please let me know if you need more information!
Actual behavior
Content of the file is bigger than what is on the server.
Expected behavior
Content of the file is the same than what is on the server.
Relevant log output
Other information
No response
The text was updated successfully, but these errors were encountered: