Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong content while downloading files >200Mb #468

Closed
bferrenq opened this issue Feb 22, 2024 · 18 comments
Closed

Wrong content while downloading files >200Mb #468

bferrenq opened this issue Feb 22, 2024 · 18 comments
Milestone

Comments

@bferrenq
Copy link

bferrenq commented Feb 22, 2024

Version

2.12.1

Bug description

Hello,
We are using Mina to download and upload files to various sftp servers.
We have noticed that while fetching some files with a size >200 Mb to a precise sftp server, content of the file is bigger than what is on the server.
For a text file we can see unexpected duplicated data inside.

By narrowing the problem, and using SftpClient.read with a bigger buffersize than the default value, downloaded file is the same as the source:
return getClient().read(resolveRemotePathValue(remoteFilename));
->
return getClient().read(resolveRemotePathValue(remoteFilename), SftpClient.IO_BUFFER_SIZE * 4);

We would like to understand how to properly configure above buffersize or any parameter to have expected behavior, so far we use most of the default settings.
Note that we could not reproduce yet such issue on other sftp servers, please let me know if you need more information!

Actual behavior

Content of the file is bigger than what is on the server.

Expected behavior

Content of the file is the same than what is on the server.

Relevant log output

Server version string: SSH-2.0-dropbear [SERVER]
SSH_MSG_CHANNEL_OPEN_CONFIRMATION sender=0, window-size=3145728, packet-size=32648

Other information

No response

@bferrenq bferrenq changed the title Wrong content while downloading file of several Mb Wrong content while downloading files >200Mb Feb 22, 2024
@tomaswolf
Copy link
Member

AFAIK dropbear does not have SFTP, so some external SFTP is used. Either OpenSSH, or some other implementation. Do you have more information on that server?

Since it occurs apparently only with that particular server my first guess would be that this is a server problem. Though it is not impossible that it might be some bug in Apache MINA SSHD that is triggered only in very specific circumstances that just happen to occur only with this server...

Tweaking parameters or the buffer size should not be necessary; looks to me that this tweaking just happens to avoid the precise circumstances that trigger this bug, wherever it might be, in your case.

Absent a reproducer that we could live debug, this would need a full trace log, plus information about the file content (at what offsets exactly do duplications occur? How long are the duplications? Expected/actual file size?) The trace log should have information about all blocks received, and their offsets and lengths. But that log might be huge.

From code inspection I see only one possibility for something like this to occur: if the server returns more data than the client requested for a chunk. But that should never happen; such a server would violate the SFTP protocol.

@bferrenq
Copy link
Author

logs-ok-ko.zip
Hello @tomaswolf and thx for you quick reply!

One major fact I forgot to mention, using linux sftp or paramiko , downloaded file is as expected.

Still I also do think it seems related to server behavior, now given native sftp is working fine, I do not know what to ask to that provider.

In order to help investigation, as I could not yet find a reproducer using for instance SftpTransferTest, I have attached logs:

  • when file size is fine, 204,425,799 bytes
  • and when file size is bigger than expected, 204,500,679 bytes

I will try to find the precise offset, I try to remember it happen in the middle of the file, bunch of lines are duplicated.

Thanks again for your support on that issue puzzling me!

@bferrenq
Copy link
Author

Hi,
Attached ko logs are associated to a csv file that contains 5848831 lines, that are <50 chars long.
Every #595317 lines (so #600K lines), the last #210 lines are duplicated. So it seems there is a kind of pattern ie not random behavior.

Benoit

@tomaswolf
Copy link
Member

I need the log with log level TRACE. Only that will log precise offsets and lengths of data received.

I also will need byte offsets. Line numbers won't help me.

If you're worried that trace logging might reveal sensitive data (buffer contents), feel free to tell me via private e-mail (on the address I use in the git commits) how and where to fetch the logs. They'll probably be too large to attach here anyway.

@bferrenq
Copy link
Author

offset-ko.zip
Indeed ko-logs in trace mode are huge, 800Mb compressed...
I will see how to expose them, in the between I did a grep on the log file around SftpInputStreamAsync that contains offset and length, I hope this can already help.

@tomaswolf
Copy link
Member

Something strange is going on at the end; the last request (id=6346) should be sent for offset 204423955 + 1842, not +32755. That looks like a bug, but it doesn't explain duplication inside the file. At what byte offsets do these duplications occur?

Given that log I'd expect a file of 204425799 bytes, which is what you wrote it should be. Are you sure that is a log of a case gone wrong? If so, the problem cannot be in receiving the data but somewhere in later processing in the stream.

@bferrenq
Copy link
Author

bferrenq commented Feb 23, 2024

Hello,

  • You are correct, file size is supposed to be 204425799 whereas it is 204500679, and unless I did a mistake, logs are for the corrupted file.

  • Regarding streaming, we are simply taking Mina's Inpustream then write in a local file. I may be wrong, still the fact that tweaking mentioned parameter makes file correctly fetched, so this should discard problem on that part, no?

  • Then file seems corrupted after 2024-02-23 14:13:33,806, so:

id=744, offset=20963200, length=32755

I do not know if this is expected, please notice below the "- len=41084" (19 times in the file)

2024-02-23 14:13:33,816 TRACE o.a.s.s.c.i.SftpInputStreamAsync [main] pollBuffer(SftpInputStreamAsync[ClientSessionImpl[[email protected]/64.233.167.130:19321]][./market_reference_data_2024-02-21.csv]) response=103 for ack=SftpAckData[id=744, offset=20963200, length=32755] - len=41084

Whereas most of the time we have "- len=32764" (13k times):

2024-02-23 14:13:33,799 TRACE o.a.s.s.c.i.SftpInputStreamAsync [main] pollBuffer(SftpInputStreamAsync[ClientSessionImpl[[email protected]/64.233.167.130:22]][./data.csv]) response=103 for ack=SftpAckData[id=743, offset=20930445, length=32755] - len=32764

  • Before providing full logs, I need provider consent, request ongoing.

@tomaswolf
Copy link
Member

tomaswolf commented Feb 23, 2024

Why did I miss that? Anyway: no need for further logs. You found the reason. That's exactly what I wrote above: from code inspection, this can happen if the server returns more data than requested, which it never must do. If it does, the server is broken.

Seems to happen exactly every 640th request.

But we can do something about this in the client side.

@baiglin
Copy link

baiglin commented Feb 23, 2024

That is great if this is indeed the reason! Really strange though why the server would do that on a regular pattern...

Something in the client would be perfect, in this case you should raise an exception or skip the extra data sent ?

2 questions:

  • Is it why when we increased the buffer size, it worked ? Also could you tell us more exactly what this buffer size is meant for ? I mean in the process of reading, where this buffer is used?
  • How do you think other client managed to work? Because python client and sftp command seems to overcome this bug ...
    IN any case thanks a lot for the quick replies and help.

@tomaswolf
Copy link
Member

I cannot answer your questions. I don't know what other buffer sizes these other SFTP clients used, or why it occurs every 640 requests. Perhaps the other clients might have worked because they used different buffer sizes?

As for solving it client-side, I'm thinking of ignoring the extra data, and logging a warning.

But I'll also have to double-check that this is not some strange bug in the lower layers in Apache MINA SSHD. Like, the server sending data correctly, but in the client somehow assembling packets to SFTP buffers goes wrong. Don't know yet what exactly is up.

And yes, it is strange it occurs so regularly.

@tomaswolf
Copy link
Member

tomaswolf commented Feb 23, 2024

41084 - 32764 = 8320

204500679 - 204425799 = 74880

74880 / 8320 = 9

The offset_ko.txt log seems to contain the result of three consecutive downloads? The first two failed or otherwise cut short? The last one (starting at 2024-02-23 14:35:26,153) has 9 times length=41084, so that matches up.

And 8320 / 640 = 13. Mighty strange. Especially since 32755 + 13 = 32768 = 32k. Coincidence?

If you can tell or find out, I'd still very much like to know what SFTP server is being used with that dropbear.

@tomaswolf
Copy link
Member

Can you please verify the locations (byte offsets) of the duplications. From your log, I would expect the first such duplication to be at offset 20995955: 8320 bytes duplicated, i.e., followed by the same 8320 bytes beginning at offset 21004275.

I don't see anything in our code that would cause the observed behavior. So my current assumption is still that this is some strange server-side bug. What worries me, though: according to your logs, the server is partnerupload.google.com. And they use dropbear and a buggy SFTP server? That sounds somewhat unlikely.

With dropbear one uses normally either the OpenSSH SFTP server or the Green End server. I see nothing in their sources that would explain the observed behavior. Of course it's possible that the server uses something else still.

@bferrenq
Copy link
Author

offset.zip
Hello Tomas,

  • You are right regarding offset_ko logs, I have uploaded correct ones that only contains 9 occurences of the unexpected length.
  • I have requested the server information, and will keep you posted once I have it,
  • I may be wrong, I would say problem is starting around offset 20971518 in our csv file.
    That csv row contains the begining of valid data then a value that does not exist in the original file, then unexpected LF.
    From there start indeed data duplication of several lines
    Then we jump to offset 41943040 where we have another corrupted row, then content duplication

I still not have approval to share data or any kind of reproducer to ease investigation, thanks again for your analysis!

@bferrenq
Copy link
Author

Hello @tomaswolf ,
We have done a test as well by using 32768 buffer size instead of 32768 -13 coming from AbstractSftpClient.

Using that setting file is not corrupted.

Do you know what is the rational behind this above "- 13"?

Hope this helps your investigation.

@tomaswolf
Copy link
Member

I've been wondering about -13 myself. The -13 aligns the SFTP buffer size such that the whole SSH_FXP_DATA SFTP message fits a channel packet exactly. (There's 13 bytes of overhead in the SFTP protocol.) However, I wonder if that is correct. The SSH connection layer adds another 9 bytes to this. If the intent was to ensure an SFTP data message is not split over two SSH channel packets, it seems to me the value should be -22.

@bferrenq
Copy link
Author

Hello @tomaswolf ,
I still do not have the server information, and will let you know.
I have receveid below feedback:
"There was 2 transfers on 2024-02-26:
At 09:29:44.039 UTC, the READ_FILE operation transferred 32768 bytes.
The corresponding READ_DONE operation was logged at 09:31:54.652 UTC, with 204425799 bytes transferred.

Later that day, at 15:28:50.850 UTC, the READ_FILE operation transferred 32755 (difference of 13 bytes).
The corresponding READ_DONE operation was logged at 15:31:06.338 UTC, with 204500679 bytes transferred.
"
I will let you know once I have more information.

@tomaswolf
Copy link
Member

tomaswolf commented Feb 28, 2024

At least their server logs confirm that the server did send too much data overall. I wonder if the server logs also show the 9 SSH_FXP_DATA replies of 41075 bytes every 640th request. (41084 bytes is the length including the SSH_FXP_DATA marker, the SFTP request ID, and the real data length.)

tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue May 5, 2024
Some SFTP servers appear to return sometimes more data than requested.
If that happened, it was possible that a downloaded file would be
corrupted (size larger than expected, and some duplicated data inside
the file).

By default, throw an exception if an SFTP server returns more data than
requested. If property SftpModuleProperties.TOLERATE_EXCESS_DATA is set
to true on the session or on the channel, discard such excess data and
log a warning.

Avoid code duplication; unify SSH_FXP_READ response handling in
AbstractSftpClient.
@tomaswolf
Copy link
Member

I've merged a change that tightens SFTP read requests. If a server sends back more data than requested, an exception is thrown. If you set SftpModuleProperties.TOLERATE_EXCESS_DATA to true, we only log a warning and ignore excess data. Whether that helps in your case in unclear, though. When a server returns more data than requested, the client cannot know which bytes are in excess: some at the beginning, or some at the end? Or some in between?

@tomaswolf tomaswolf added this to the 2.13.0 milestone May 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants