Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sftp clients based on the Mina-sshd and JSCH components upload and download files at very different speeds #524

Closed
czldb2 opened this issue Jul 1, 2024 · 21 comments · Fixed by #579
Assignees
Milestone

Comments

@czldb2
Copy link

czldb2 commented Jul 1, 2024

Phenomenon:

Uploading and downloading files with mina-sshd is about twice as slow as JSCH
企业微信截图_17198149038657

Code:

Mina-sshd

Dependency
<dependency>
    <groupId>org.apache.sshd</groupId>
    <artifactId>sshd-core</artifactId>
    <version>2.13.0</version>
</dependency>
<dependency>
    <groupId>org.apache.sshd</groupId>
    <artifactId>sshd-common</artifactId>
    <version>2.13.0</version>
</dependency>
<dependency>
    <groupId>org.apache.sshd</groupId>
    <artifactId>sshd-sftp</artifactId>
    <version>2.13.0</version>
</dependency>
demo
import org.apache.sshd.client.SshClient;
import org.apache.sshd.client.future.ConnectFuture;
import org.apache.sshd.client.session.ClientSession;
import org.apache.sshd.sftp.client.SftpClient;
import org.apache.sshd.sftp.client.SftpClientFactory;
import org.apache.sshd.common.util.io.IoUtils;

import java.io.InputStream;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.concurrent.TimeUnit;

public class SftpTestTool {

    public static void main(String[] args) {
        String host = "x.x.x.x";
        int port = 22;
        String user = "root";
        String password = "root123";

        String localFile = "src/main/resources/temp/temp.gz";
        String remoteFile = "/home/user/temp/temp.gz";

        SshClient client = SshClient.setUpDefaultClient();
        client.start();

        long connectStartTime = System.currentTimeMillis();
        try (ClientSession session = createSession(client, host, port, user, password)) {
            SftpClientFactory factory = SftpClientFactory.instance();
            try (SftpClient sftpClient = factory.createSftpClient(session)) {
                long connectEndTime = System.currentTimeMillis();
                System.out.println("Mina SSHD connect Time: " + (connectEndTime - connectStartTime) + " ms");
                long uploadStartTime = System.currentTimeMillis();
                try (InputStream localInputStream = Files.newInputStream(Paths.get(localFile));
                     OutputStream remoteOutputStream = sftpClient.write(remoteFile)) {
                    IoUtils.copy(localInputStream, remoteOutputStream);
                }
                long uploadEndTime = System.currentTimeMillis();
                System.out.println("Mina SSHD Upload Time: " + (uploadEndTime - uploadStartTime) + " ms");

                long downloadStartTime = System.currentTimeMillis();
                try (InputStream remoteInputStream = sftpClient.read(remoteFile);
                     OutputStream localOutputStream = Files.newOutputStream(Paths.get("src/main/resources/temp/temp.gz.bak"))) {
                    IoUtils.copy(remoteInputStream, localOutputStream);
                }
                long downloadEndTime = System.currentTimeMillis();
                System.out.println("Mina SSHD Download Time: " + (downloadEndTime - downloadStartTime) + " ms");
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            client.stop();
        }
    }

    private static ClientSession createSession(SshClient client, String host, int port, String user, String password) throws Exception {
        ConnectFuture connectFuture = client.connect(user, host, port);
        connectFuture.await(5, TimeUnit.SECONDS);

        ClientSession session = connectFuture.getSession();
        session.addPasswordIdentity(password);
        session.auth().verify(5, TimeUnit.SECONDS);
        return session;
    }

}

JSCH

Dependency
<dependency>
    <groupId>com.github.mwiede</groupId>
    <artifactId>jsch</artifactId>
    <version>0.2.3</version>
</dependency>
demo
import com.jcraft.jsch.ChannelSftp;
import com.jcraft.jsch.JSch;
import com.jcraft.jsch.Session;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.util.Properties;

public class JSCHTestTool {

public static void main(String[] args) {
String host = "x.x.x.x";
 int port = 22;
 String user = "root";
 String password = "root123";

 String localFile = "src/main/resources/temp/temp.gz";
 String remoteFile = "/home/user/temp/temp.gz";

 JSch jsch = new JSch();
 Session session = null;
 ChannelSftp channelSftp = null;

 try {
long connectStartTime = System.currentTimeMillis();
 session = jsch.getSession(user, host, port);
 session.setPassword(password);
 Properties config = new Properties();
 config.put("StrictHostKeyChecking", "no");
 session.setConfig(config);
 session.connect();

 channelSftp = (ChannelSftp) session.openChannel("sftp");
 channelSftp.connect();
 long connectEndTime = System.currentTimeMillis();
 System.out.println("JSch connect Time: " + (connectEndTime - connectStartTime) + " ms");

 long uploadStartTime = System.currentTimeMillis();
 channelSftp.put(new FileInputStream(localFile), remoteFile);
 long uploadEndTime = System.currentTimeMillis();
 System.out.println("JSch Upload Time: " + (uploadEndTime - uploadStartTime) + " ms");

 long downloadStartTime = System.currentTimeMillis();
 channelSftp.get(remoteFile, new FileOutputStream("src/main/resources/temp/temp.gz.bak"));
 long downloadEndTime = System.currentTimeMillis();
 System.out.println("JSch Download Time: " + (downloadEndTime - downloadStartTime) + " ms");

 } catch (Exception e) {
e.printStackTrace();
 } finally {
if (channelSftp != null) {
channelSftp.disconnect();
 }
if (session != null) {
session.disconnect();
 }
}
}

}

Question

Is there something wrong with my coding? Is there any configuration or correct encoding of mina-sshd so that my sftp function can reach the same speed level as jsch?

@tomaswolf
Copy link
Member

Cannot reproduce on Mac OS X, running an Apache MINA sshd SFTP client against an OpenSSH SFTP server running in an Alpine container.

IoUtils.copy(InputStream, OutputStream) by default uses a smallish buffer of 8kB. With that, I see performance slightly slower than JSCH. Using a 32kB buffer via IoUtils.copy(InputStream, OutputStream, 32 * 1024) I get slightly better performance with Apache MINA sshd than with JSCH.

@czldb2
Copy link
Author

czldb2 commented Jul 2, 2024

Cannot reproduce on Mac OS X, running an Apache MINA sshd SFTP client against an OpenSSH SFTP server running in an Alpine container.

IoUtils.copy(InputStream, OutputStream) by default uses a smallish buffer of 8kB. With that, I see performance slightly slower than JSCH. Using a 32kB buffer via IoUtils.copy(InputStream, OutputStream, 32 * 1024) I get slightly better performance with Apache MINA sshd than with JSCH.

I modified the buffer size and made some changes in thread switching, but the transfer efficiency is still the same as before
image

import org.apache.sshd.client.SshClient;
import org.apache.sshd.client.future.ConnectFuture;
import org.apache.sshd.client.session.ClientSession;
import org.apache.sshd.sftp.client.SftpClient;
import org.apache.sshd.sftp.client.SftpClientFactory;

import java.io.InputStream;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.EnumSet;
import java.util.concurrent.TimeUnit;

public class SftpTestTool {

    private static final int BUFFER_SIZE = 65536;

    public static void main(String[] args) {
        String host = "x.x.x.x";
        int port = 22;
        String user = "root";
        String password = "rootr123";

        String localFile = "src/main/resources/temp/temp.gz";
        String remoteFile = "/home/user/temp/temp.gz";

        SshClient client = SshClient.setUpDefaultClient();
        client.start();

        long connectStartTime = System.currentTimeMillis();
        try (ClientSession session = createSession(client, host, port, user, password)) {
            SftpClientFactory factory = SftpClientFactory.instance();
            try (SftpClient sftpClient = factory.createSftpClient(session)) {
                long connectEndTime = System.currentTimeMillis();
                System.out.println("Mina SSHD connect Time: " + (connectEndTime - connectStartTime) + " ms");

                long uploadStartTime = System.currentTimeMillis();
                try (InputStream localInputStream = Files.newInputStream(Paths.get(localFile));
                     OutputStream remoteOutputStream = sftpClient.write(remoteFile, EnumSet.of(SftpClient.OpenMode.Create, SftpClient.OpenMode.Write))) {
                    byte[] buffer = new byte[BUFFER_SIZE];
                    int bytesRead;
                    while ((bytesRead = localInputStream.read(buffer)) != -1) {
                        remoteOutputStream.write(buffer, 0, bytesRead);
                    }
                }
                long uploadEndTime = System.currentTimeMillis();
                System.out.println("Mina SSHD Upload Time: " + (uploadEndTime - uploadStartTime) + " ms");

                long downloadStartTime = System.currentTimeMillis();
                try (InputStream remoteInputStream = sftpClient.read(remoteFile);
                     OutputStream localOutputStream = Files.newOutputStream(Paths.get("src/main/resources/temp/temp.gz.bak"))) {
                    byte[] buffer = new byte[BUFFER_SIZE];
                    int bytesRead;
                    while ((bytesRead = remoteInputStream.read(buffer)) != -1) {
                        localOutputStream.write(buffer, 0, bytesRead);
                    }
                }
                long downloadEndTime = System.currentTimeMillis();
                System.out.println("Mina SSHD Download Time: " + (downloadEndTime - downloadStartTime) + " ms");
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            client.stop();
        }
    }

    private static ClientSession createSession(SshClient client, String host, int port, String user, String password) throws Exception {
        ConnectFuture connectFuture = client.connect(user, host, port);
        connectFuture.await(5, TimeUnit.SECONDS);

        ClientSession session = connectFuture.getSession();
        session.addPasswordIdentity(password);
        session.auth().verify(5, TimeUnit.SECONDS);
        return session;
    }
}

@czldb2
Copy link
Author

czldb2 commented Jul 2, 2024

I found that the write() method of the SftpOutputStreamAsync class seems to check every sftp packet; And then SftpRemotePathChannel has a transferTo() method that looks like it's aggregated internally to check. I am not sure if this is the case, and then I would like to ask the use of transforTo() or transforFrom(). Another scenario is how to handle the sftp ls command in the case of a large number of files or directories like transforTo()?

@tomaswolf
Copy link
Member

On Windows, I see inconsistent timings, but indeed most of the time the file transfer with Apache MINA sshd appears to be about 50% slower than with JSch.

@czldb2
Copy link
Author

czldb2 commented Jul 3, 2024

On Windows, I see inconsistent timings, but indeed most of the time the file transfer with Apache MINA sshd appears to be about 50% slower than with JSch.

Do we have plans to optimize this feature?

image

I found that every time the data is sent, it needs to be confirmed, but jsch seems to confirm the same

@tomaswolf
Copy link
Member

This is something else, and indeed I have an idea about providing asynchronous SFTP client interfaces. But this is unrelated to the question of why file transfer on Windows is slower than expected, but apparently not on OS X. Off-hand I don't see where the bottleneck might be; it should not be the ACK handling in SftpOutputStreamAsync: that one only checks ACKs that have already arrived.

This may need extensive profiling or log analysis.

@tomaswolf
Copy link
Member

This problem definitely is specific to Windows. It does not occur on OS X. (Didn't test Linux.)

The problem occurs with both the NIO2 and the netty I/O back-end. (Didn't try MINA.) Performance with either back-end is the same. So it's not an I/O back-end problem.

The problem also definitely is not in the SFTP layer. I performed and timed a direct file upload via the equivalent of cat local_file | ssh user@somehost 'cat > remote_file' with Apache MINA sshd:

try (ChannelExec exec = session.createExecChannel("cat > upload/testfile.bin")) {
    exec.setStreaming(Streaming.Async);
    exec.open().verify();
    long start = System.currentTimeMillis();
    try (InputStream localInputStream = Files.newInputStream(localFile)) {
        IoOutputStream out = exec.getAsyncIn();
        byte[] buf = new byte[32 * 1024];
        int n = 0;
        while ((n = localInputStream.read(buf)) >= 0) {
            if (n > 0) {
                out.writeBuffer(new ByteArrayBuffer(buf, 0, n)).verify(1000);
            }
        }
        out.close(false);
    }
    long elapsed = System.currentTimeMillis() - start;
    System.err.println("Apache cat stream upload took " + (elapsed / 1000.0) + "sec");
}

This uploads the file without SFTP, so there is zero overhead for the SFTP protocol and its ACKs. Timing this (repeatedly) showed no significant differences from normal uploads via SFTP. Which means that the bottleneck is not in the SFTP code. (Assuming that on the server side using "cat" is not slower than "internal-sftp". The server again is OpenSSH running in an alpine:latest container.)

As mentioned before, the timings are fairly inconsistent from run to run, too. (For both JSch and Apache MINA sshd.) In some runs JSch is slow, too, and Apache MINA sshd is actually faster. In some runs, both are fast (and about equal). In most runs, though, Apache MINA sshd is noticeably slower.

Might be a threading issue, or a memory issue. Or a channel window issue (though then I'd expect to see the problem also on OS X). But whatever it is, it isn't in the SFTP code.

@czldb2
Copy link
Author

czldb2 commented Jul 5, 2024

This problem definitely is specific to Windows. It does not occur on OS X. (Didn't test Linux.)

The problem occurs with both the NIO2 and the netty I/O back-end. (Didn't try MINA.) Performance with either back-end is the same. So it's not an I/O back-end problem.

The problem also definitely is not in the SFTP layer. I performed and timed a direct file upload via the equivalent of cat local_file | ssh user@somehost 'cat > remote_file' with Apache MINA sshd:

try (ChannelExec exec = session.createExecChannel("cat > upload/testfile.bin")) {
    exec.setStreaming(Streaming.Async);
    exec.open().verify();
    long start = System.currentTimeMillis();
    try (InputStream localInputStream = Files.newInputStream(localFile)) {
        IoOutputStream out = exec.getAsyncIn();
        byte[] buf = new byte[32 * 1024];
        int n = 0;
        while ((n = localInputStream.read(buf)) >= 0) {
            if (n > 0) {
                out.writeBuffer(new ByteArrayBuffer(buf, 0, n)).verify(1000);
            }
        }
        out.close(false);
    }
    long elapsed = System.currentTimeMillis() - start;
    System.err.println("Apache cat stream upload took " + (elapsed / 1000.0) + "sec");
}

This uploads the file without SFTP, so there is zero overhead for the SFTP protocol and its ACKs. Timing this (repeatedly) showed no significant differences from normal uploads via SFTP. Which means that the bottleneck is not in the SFTP code. (Assuming that on the server side using "cat" is not slower than "internal-sftp". The server again is OpenSSH running in an alpine:latest container.)

As mentioned before, the timings are fairly inconsistent from run to run, too. (For both JSch and Apache MINA sshd.) In some runs JSch is slow, too, and Apache MINA sshd is actually faster. In some runs, both are fast (and about equal). In most runs, though, Apache MINA sshd is noticeably slower.

Might be a threading issue, or a memory issue. Or a channel window issue (though then I'd expect to see the problem also on OS X). But whatever it is, it isn't in the SFTP code.

Thank you for your answer, I still need to learn more knowledge and then analyze this problem

@czldb2
Copy link
Author

czldb2 commented Jul 5, 2024

image
I used the transferFrom() method of SftpRemotePathChannel for file operations in my windows environment twice as fast as the default write() method of the SftpOutputStreamAsync class

@tomaswolf
Copy link
Member

I run tests using a wide variety of methods, including transferFrom. I do not get consistently good timings with that either on Windows, but it is generally one of the faster methods. Which is actually a bit surprising, since as you may notice it uses SftpOutputStreamAsync .write(), too.

@ryabovgs
Copy link

ryabovgs commented Jul 6, 2024

Hi @tomaswolf. Could you please advise how I can optimise Mina setup (I use default one) to reach same performance as JSCH?
Problem: I got 20% degradation in speed for uploading of 50 files of 40mb
Main concern from profiling point of view(VisualVM) that OutputStream::write looks as 27s total vs 10s of JSCH.
Approach: I use Streams approach as you mentioned as fastest.
Default setup and streams approach details:

  • SshClient.setUpDefaultClient()
  • DefaultSftpClientFactory.INSTANCE.createSftpClient(clientSession);
  • outputStream = sftpClient.write(file) // here during debug session I see that bufferSize 32755 is used
  • tried also MinaServiceFactoryFactory/Nio2ServiceFactoryFactory without good speed too via setIoServiceFactoryFactory(sshClient, factory);
    May be the reason could be async nature of Mina and I missed the guide how to use it optimistically - could you plz point out if it exists also.

@tomaswolf
Copy link
Member

Could you please advise how I can optimise Mina setup (I use default one) to reach same performance as JSCH?

@ryabovgs : no, I cannot. If you read the comments in this issue, you see that currently I only know that there is a performance problem on Windows (and apparently Windows only), but I don't know yet why. I normally develop on OS X, and I have only temporary and sporadic access to a Windows machine. So it may be a while until I figure this out.

tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Jul 16, 2024
Unroll the permutation loop of ChaCha20 and simplify the crypt() method
to work on bytes instead of ints. Unroll and optimize the pack/unpack
methods in ChaCha20 and in Poly1305. Don't use Integer.toUnsignedLong():
avoid the extra method call by doing (int & 0xFFFF_FFFFL) inline. In
Poly1305.processBlock(), use ints instead of longs where possible
(variables t0-t3).

All this brings a speed-up of about 40% for the encryption/decryption.
Of the time spent in ChaCha20-Poly1305 about one third is spent in the
ChaChaEngine and two thirds in Poly1305.processBlock().

Bug: apache#524
tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Jul 16, 2024
For ByteArrayBuffer, optimize the getInt()/putInt() operations to work
directly in the buffer without copying into/from a local work area.
Simplify some of the operations; in particular, avoid extending bytes
to long and then shifting.

Bug: apache#524
tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Jul 16, 2024
In ChannelAsyncOutputStream, do not create a new SSH packet if the
caller already had passed in an SSH packet buffer, and we're writing
it fully. This avoids allocating a new buffer, which gives GC less
to do, and it also avoid needlessly copying the data around. Instead
the prepared SSH packet buffer is encoded and written directly.

Bug: apache#524
tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Jul 16, 2024
Give SftpOutPutStreamAsync a transferFrom operation to be able to read
directly into a prepared SSH packet uffer. Previously, we always had
to read into a general byte buffer, and then copy into the SSH packet.
This extra data copy can be avoided. If the copy buffer size is such
that the data fits into a single SSH packet, then this packet buffer
can be encoded and sent directly without any further data copying.

Simplify the implementation: since the stream always uses a PacketBuffer
the code paths for other general byte buffers can simply be removed.

Also enable reading the next buffer while the previous one is being
sent. Use two buffers alternatingly; one being sent, the other being
filled.

Use this implementation also in SftpRemotePathChannel in its
transferFrom() implementation. _Do_ close the stream there, otherwise
the final ACKs may not be checked.

Bug: apache#524
tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Jul 16, 2024
Bouncy Castle apparently only has native implementations in its "LTS"
releases. BC 1.78.1 has none. SunJCE uses native code.

The "security registrar" architecture in Apache MINA sshd prefers
registered providers over the platform providers, and it automatically
registers Bouncy Castle if it's present. So with Bouncy Castle on the
classpath and an SSH connection for which an AES cipher was specified,
Apache MINA sshd would use the much slower Bouncy Castle AES.

Add a new "SunJCEWrapper" registrar that brings the SunJCE AES and
HmacSHA* to the front of the list, so that they are used even if Bouncy
Castle is also registered.

The new registrar can be disabled via a system property as usual, and
it is only enabled if the JVM indeed has a "SunJCE" security provider
registered.

See also https://issues.apache.org/jira/browse/SSHD-719 (issue closed
as "not needed").

Bug: apache#524
@tomaswolf
Copy link
Member

tomaswolf commented Jul 16, 2024

I found a couple of things that could be improved; for details see PR #530.

While comparing out-of-the-box performance of JSch and sshd is legitimate, one needs to be aware that they are using different encryption algorithms by default. JSch uses the AES implementation from the built-in SunJCE provider (that's native code), sshd uses a Java implementation of the ChaCha20-Poly1305 cipher (if the server supports it, as does OpenSSH).

So, three findings:

  1. It turned out that our ChaCha20-Poly1305 implementation was very clean, but not speed-optimized. PR Performance optimizations #530 optimizes this, and it is now faster than the BC implementation, and nearly as fast as the native code AES.
  2. A second area improved in PR Performance optimizations #530 is general buffer handling in sshd. Sometimes data was copied needlessly from one buffer into another. However, in the grand scheme of things these buffer optimizations are only a (vanishingly) small factor.
  3. Thirdly, I discovered that if Bouncy Castle was available, and an AES encryption was used for the connection, sshd would use the Bouncy Castle AES, not the one from SunJCE. The AES in BouncyCastle 1.78.1 is a Java implementation and thus much slower. That is also fixed in PR Performance optimizations #530; sshd now prefers the SunJCE AES over the one from Bouncy Castle.

I think we basically have two kinds of reports:

  • Some that see sshd as about 10-50% slower than JSch. Most likely that's points (1) and (2), and comparing AES vs. ChaCha20-Poly1305.
  • Some that see sshd as 200-300% (or even more) slower than JSch. Most likely that's point (3): sshd using the Bouncy Castle AES.

Performance testing SFTP uploads is difficult. I used JMH benchmarks measuring the performance of only the upload itself (excluding SSH session setup, authentication, and setting up the SFTP channel), with a local JSch or sshd client uploading a 20MB random file to an OpenSSH server. To really benchmark this, one would need a lab setup with a controlled environment: a second machine running the SFTP server, and a local network connection with stable and known throughput and latency.

I do not have such a setup. I used mostly a local container running the SFTP server, which of course introduced uncontrollable variation in timings. After all, the same machine then also runs the server, in a container, so there's unpredictable virtualization overhead. The timings I got from this were still surprisingly consistent and enabled me to find the above three problems. With PR #530 applied, JSch and sshd perform equally well in these tests for me. (Uploading 20MBs takes generally about 270-290ms on the machine I tested.)

Just to double-check I also ran the same benchmarks with an AWS EC2 t4g.small instance for the OpenSSH server. Again, JSch and sshd with PR #530 applied performed equal -- uploading 20MB took 4-5 seconds, but there were some outliers with both JSch or sshd when it took 6-8, or 12, or once even 20 seconds. Of course, these uploads went over the general Internet (and had cross at least two firewalls, both beyond my control), so I cannot possibly say what happened in these outlier cases. There were about 3 such outlier per 15 upload attempts.

In both setups I still have the feeling that JSch is a teeny-weeny bit faster (if both are using AES), maybe 2-3%, but that's a feeling only; the statistics don't really support that. Variability and standard deviation in the timings is too high. If the feeling is true, though, it might be due to the multi-threaded nature of sshd. Multi-threading makes a lot of sense for an SSH server, but for a client, the single-threaded approach of JSch has perhaps slightly less overhead. In any case I do not see any significant differences anymore.

Feel free to run your own benchmarks on the code from PR #530 and to report the setups and results.

@tomaswolf tomaswolf self-assigned this Jul 16, 2024
@tomaswolf tomaswolf added this to the 2.13.2 milestone Jul 16, 2024
@czldb2
Copy link
Author

czldb2 commented Jul 18, 2024

I found a couple of things that could be improved; for details see PR #530.

While comparing out-of-the-box performance of JSch and sshd is legitimate, one needs to be aware that they are using different encryption algorithms by default. JSch uses the AES implementation from the built-in SunJCE provider (that's native code), sshd uses a Java implementation of the ChaCha20-Poly1305 cipher (if the server supports it, as does OpenSSH).

So, three findings:

  1. It turned out that our ChaCha20-Poly1305 implementation was very clean, but not speed-optimized. PR Performance optimizations #530 optimizes this, and it is now faster than the BC implementation, and nearly as fast as the native code AES.
  2. A second area improved in PR Performance optimizations #530 is general buffer handling in sshd. Sometimes data was copied needlessly from one buffer into another. However, in the grand scheme of things these buffer optimizations are only a (vanishingly) small factor.
  3. Thirdly, I discovered that if Bouncy Castle was available, and an AES encryption was used for the connection, sshd would use the Bouncy Castle AES, not the one from SunJCE. The AES in BouncyCastle 1.78.1 is a Java implementation and thus much slower. That is also fixed in PR Performance optimizations #530; sshd now prefers the SunJCE AES over the one from Bouncy Castle.

I think we basically have two kinds of reports:

  • Some that see sshd as about 10-50% slower than JSch. Most likely that's points (1) and (2), and comparing AES vs. ChaCha20-Poly1305.
  • Some that see sshd as 200-300% (or even more) slower than JSch. Most likely that's point (3): sshd using the Bouncy Castle AES.

Performance testing SFTP uploads is difficult. I used JMH benchmarks measuring the performance of only the upload itself (excluding SSH session setup, authentication, and setting up the SFTP channel), with a local JSch or sshd client uploading a 20MB random file to an OpenSSH server. To really benchmark this, one would need a lab setup with a controlled environment: a second machine running the SFTP server, and a local network connection with stable and known throughput and latency.

I do not have such a setup. I used mostly a local container running the SFTP server, which of course introduced uncontrollable variation in timings. After all, the same machine then also runs the server, in a container, so there's unpredictable virtualization overhead. The timings I got from this were still surprisingly consistent and enabled me to find the above three problems. With PR #530 applied, JSch and sshd perform equally well in these tests for me. (Uploading 20MBs takes generally about 270-290ms on the machine I tested.)

Just to double-check I also ran the same benchmarks with an AWS EC2 t4g.small instance for the OpenSSH server. Again, JSch and sshd with PR #530 applied performed equal -- uploading 20MB took 4-5 seconds, but there were some outliers with both JSch or sshd when it took 6-8, or 12, or once even 20 seconds. Of course, these uploads went over the general Internet (and had cross at least two firewalls, both beyond my control), so I cannot possibly say what happened in these outlier cases. There were about 3 such outlier per 15 upload attempts.

In both setups I still have the feeling that JSch is a teeny-weeny bit faster (if both are using AES), maybe 2-3%, but that's a feeling only; the statistics don't really support that. Variability and standard deviation in the timings is too high. If the feeling is true, though, it might be due to the multi-threaded nature of sshd. Multi-threading makes a lot of sense for an SSH server, but for a client, the single-threaded approach of JSch has perhaps slightly less overhead. In any case I do not see any significant differences anymore.

Feel free to run your own benchmarks on the code from PR #530 and to report the setups and results.

Thank you very much for your help. I will try my best to use the code you provide to conduct some experiments and tests in the future, and then give you feedback

tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Jul 18, 2024
The update() implementation copied _all_ bytes (successively) first into
an internal 16-byte buffer and then processed that buffer. This is no
needed if the input is long.

Use the internal 16-byte buffer only for inputs shorter than 16 bytes,
or if there is a leftover of less than 16 bytes at the end of a long
input. In between process 16-byte chunks directly from the input byte
array.

For 32kB inputs this saves us some 2048 calls to System.arraycopy()
copying all those 32kB. The speedup is minimal but noticeable in
benchmarking.

Bug: apache#524
tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Jul 19, 2024
Unroll the permutation loop of ChaCha20 and simplify the crypt() method
to work on bytes instead of ints. Unroll and optimize the pack/unpack
methods in ChaCha20 and in Poly1305. Don't use Integer.toUnsignedLong():
avoid the extra method call by doing (int & 0xFFFF_FFFFL) inline. In
Poly1305.processBlock(), use ints instead of longs where possible
(variables t0-t3).

All this brings a speed-up of about 40% for the encryption/decryption.
Of the time spent in ChaCha20-Poly1305 about one third is spent in the
ChaChaEngine and two thirds in Poly1305.processBlock().

Bug: apache#524
tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Jul 19, 2024
For ByteArrayBuffer, optimize the getInt()/putInt() operations to work
directly in the buffer without copying into/from a local work area.
Simplify some of the operations; in particular, avoid extending bytes
to long and then shifting.

Bug: apache#524
tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Jul 19, 2024
In ChannelAsyncOutputStream, do not create a new SSH packet if the
caller already had passed in an SSH packet buffer, and we're writing
it fully. This avoids allocating a new buffer, which gives GC less
to do, and it also avoid needlessly copying the data around. Instead
the prepared SSH packet buffer is encoded and written directly.

Bug: apache#524
tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Jul 19, 2024
Give SftpOutPutStreamAsync a transferFrom operation to be able to read
directly into a prepared SSH packet uffer. Previously, we always had
to read into a general byte buffer, and then copy into the SSH packet.
This extra data copy can be avoided. If the copy buffer size is such
that the data fits into a single SSH packet, then this packet buffer
can be encoded and sent directly without any further data copying.

Simplify the implementation: since the stream always uses a PacketBuffer
the code paths for other general byte buffers can simply be removed.

Also enable reading the next buffer while the previous one is being
sent. Use two buffers alternatingly; one being sent, the other being
filled.

Use this implementation also in SftpRemotePathChannel in its
transferFrom() implementation. _Do_ close the stream there, otherwise
the final ACKs may not be checked.

Bug: apache#524
tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Jul 19, 2024
Bouncy Castle apparently only has native implementations in its "LTS"
releases. BC 1.78.1 has none. SunJCE uses native code.

The "security registrar" architecture in Apache MINA sshd prefers
registered providers over the platform providers, and it automatically
registers Bouncy Castle if it's present. So with Bouncy Castle on the
classpath and an SSH connection for which an AES cipher was specified,
Apache MINA sshd would use the much slower Bouncy Castle AES.

Add a new "SunJCEWrapper" registrar that brings the SunJCE AES and
HmacSHA* to the front of the list, so that they are used even if Bouncy
Castle is also registered.

The new registrar can be disabled via a system property as usual, and
it is only enabled if the JVM indeed has a "SunJCE" security provider
registered.

See also https://issues.apache.org/jira/browse/SSHD-719 (issue closed
as "not needed").

Bug: apache#524
tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Jul 19, 2024
The update() implementation copied _all_ bytes (successively) first into
an internal 16-byte buffer and then processed that buffer. This is no
needed if the input is long.

Use the internal 16-byte buffer only for inputs shorter than 16 bytes,
or if there is a leftover of less than 16 bytes at the end of a long
input. In between process 16-byte chunks directly from the input byte
array.

For 32kB inputs this saves us some 2048 calls to System.arraycopy()
copying all those 32kB. The speedup is minimal but noticeable in
benchmarking.

Bug: apache#524
@tomaswolf tomaswolf modified the milestones: 2.13.2, 2.14.0 Jul 20, 2024
@czldb2
Copy link
Author

czldb2 commented Jul 21, 2024

image
I did 20 initial attempts locally, comparing the optimized code with the pre-optimized code and the code using jsch, and found that the optimized code was indeed 30%-40% faster than before, and slightly about 10% slower than jsch. My test code was all run in the idea compiler on the local windows machine, with no agent between the local and the uploaded target machine.

@tomaswolf
Copy link
Member

What ciphers were used? If you compare JSch with native AES against MINA sshd with ChaCha20-Poly1305, 10% is about to be expected.

@czldb2
Copy link
Author

czldb2 commented Jul 21, 2024

What ciphers were used? If you compare JSch with native AES against MINA sshd with ChaCha20-Poly1305, 10% is about to be expected.
17215643784088

Yes, mina used ChaCha20-Poly1305 before, then I removed the ChaCha20-Poly1305 algorithm supported by the ssh server and used aes128-ctr to improve the speed, but it was slightly slower than jsch.

image

tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Jul 25, 2024
Unroll the permutation loop of ChaCha20 and simplify the crypt() method
to work on bytes instead of ints. Unroll and optimize the pack/unpack
methods in ChaCha20 and in Poly1305. Don't use Integer.toUnsignedLong():
avoid the extra method call by doing (int & 0xFFFF_FFFFL) inline. In
Poly1305.processBlock(), use ints instead of longs where possible
(variables t0-t3).

All this brings a speed-up of about 40% for the encryption/decryption.
Of the time spent in ChaCha20-Poly1305 about one third is spent in the
ChaChaEngine and two thirds in Poly1305.processBlock().

Bug: apache#524
tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Jul 25, 2024
For ByteArrayBuffer, optimize the getInt()/putInt() operations to work
directly in the buffer without copying into/from a local work area.
Simplify some of the operations; in particular, avoid extending bytes
to long and then shifting.

Bug: apache#524
tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Jul 25, 2024
In ChannelAsyncOutputStream, do not create a new SSH packet if the
caller already had passed in an SSH packet buffer, and we're writing
it fully. This avoids allocating a new buffer, which gives GC less
to do, and it also avoid needlessly copying the data around. Instead
the prepared SSH packet buffer is encoded and written directly.

Bug: apache#524
tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Jul 25, 2024
Give SftpOutPutStreamAsync a transferFrom operation to be able to read
directly into a prepared SSH packet uffer. Previously, we always had
to read into a general byte buffer, and then copy into the SSH packet.
This extra data copy can be avoided. If the copy buffer size is such
that the data fits into a single SSH packet, then this packet buffer
can be encoded and sent directly without any further data copying.

Simplify the implementation: since the stream always uses a PacketBuffer
the code paths for other general byte buffers can simply be removed.

Also enable reading the next buffer while the previous one is being
sent. Use two buffers alternatingly; one being sent, the other being
filled.

Use this implementation also in SftpRemotePathChannel in its
transferFrom() implementation. _Do_ close the stream there, otherwise
the final ACKs may not be checked.

Bug: apache#524
tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Jul 25, 2024
Bouncy Castle apparently only has native implementations in its "LTS"
releases. BC 1.78.1 has none. SunJCE uses native code.

The "security registrar" architecture in Apache MINA sshd prefers
registered providers over the platform providers, and it automatically
registers Bouncy Castle if it's present. So with Bouncy Castle on the
classpath and an SSH connection for which an AES cipher was specified,
Apache MINA sshd would use the much slower Bouncy Castle AES.

Add a new "SunJCEWrapper" registrar that brings the SunJCE AES and
HmacSHA* to the front of the list, so that they are used even if Bouncy
Castle is also registered.

The new registrar can be disabled via a system property as usual, and
it is only enabled if the JVM indeed has a "SunJCE" security provider
registered.

See also https://issues.apache.org/jira/browse/SSHD-719 (issue closed
as "not needed").

Bug: apache#524
tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Jul 25, 2024
The update() implementation copied _all_ bytes (successively) first into
an internal 16-byte buffer and then processed that buffer. This is no
needed if the input is long.

Use the internal 16-byte buffer only for inputs shorter than 16 bytes,
or if there is a leftover of less than 16 bytes at the end of a long
input. In between process 16-byte chunks directly from the input byte
array.

For 32kB inputs this saves us some 2048 calls to System.arraycopy()
copying all those 32kB. The speedup is minimal but noticeable in
benchmarking.

Bug: apache#524
tomaswolf added a commit that referenced this issue Jul 29, 2024
In the ChaChaEngine, exploit peculiarities of its use in SSH: the
counter is actually 32bit and never overflows, and the nonce is the
SSH packet sequence number, also 32 bits, and wraps on overflow. As
a result two of the ints of the engine state are always zero, and
the handling of nonce and counter can be slightly simplified.

In Poly1305Mac, inline the long multiplications. Avoid extensions
from int to long for the precomputed values (r, s, k); store them as
longs up front. For h, reduce the number of extensions from 25 to 5
by doing it once before the multiplications.

As a side effect this part of the code also is nicer to read.
tomaswolf added a commit that referenced this issue Aug 6, 2024
Streamline ACK checking a bit. Do a quick check for the OK case first
and do the full and proper parsing only if the status is not OK. Also
avoid getting the IDLE_TIMEOUT repeatedly when draining remaining ACKs
when SftpOutputStreamAsync is closed.
tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Aug 7, 2024
The new maven project sshd-benchmarks contains JMH benchmarks. The
project is not part of the binary distribution, and the install:install
and deploy:deploy targets are skipped. The project is included in the
source distribution.

The benchmarks are intended to be run locally.

Currently the project contains benchmarks for SFTP file uploads, which
can be run either against a local container (if the docker engine is
running), or against an external SFTP server.

The test dependencies to JUnit and Mockito had to moved from the
top-level POM to the individual project POMs. Having the JUnit
dependency in the JMH sshd-benchmarks project causes compilation errors
in Eclipse.
@tomaswolf
Copy link
Member

Single data points from single manually timed runs don't help anyone. In #579 I've published the JMH benchmarks I've been using. I do not see any significant speed differences in SFTP file uploads between JSch 0.2.18 and Apache MINA 2.14.0-SNAPSHOT (at current master commit 16b3078).

If you run these benchmarks and you get significant performance differences, you will have to profile and analyze the problem yourself. It may be specific to your particular environment. If you find something and can improve it: we always welcome PRs.

If you report any findings include in addition to the benchmark summary also the JVM version the benchmark was run on.

Since I do not see any meaningful speed differences anymore I will only be able to help with further performance problems in simple SFTP uploads if you can provide a setup or test or benchmark that reproduces the problem in my environment.

This issue will be closed once #579 is merged; for future problems feel free to open new issues.

@czldb2
Copy link
Author

czldb2 commented Aug 8, 2024

Single data points from single manually timed runs don't help anyone. In #579 I've published the JMH benchmarks I've been using. I do not see any significant speed differences in SFTP file uploads between JSch 0.2.18 and Apache MINA 2.14.0-SNAPSHOT (at current master commit 16b3078).

If you run these benchmarks and you get significant performance differences, you will have to profile and analyze the problem yourself. It may be specific to your particular environment. If you find something and can improve it: we always welcome PRs.

If you report any findings include in addition to the benchmark summary also the JVM version the benchmark was run on.

Since I do not see any meaningful speed differences anymore I will only be able to help with further performance problems in simple SFTP uploads if you can provide a setup or test or benchmark that reproduces the problem in my environment.

This issue will be closed once #579 is merged; for future problems feel free to open new issues.

Thank you very much for your help. At present, the performance has indeed improved. I will contact you if there are other findings in the future

tomaswolf added a commit to tomaswolf/mina-sshd that referenced this issue Aug 9, 2024
The new maven project sshd-benchmarks contains JMH benchmarks. The
project is not part of the binary distribution, and the install:install
and deploy:deploy targets are skipped. The project is included in the
source distribution.

The benchmarks are intended to be run locally.

Currently the project contains benchmarks for SFTP file uploads, which
can be run either against a local container (if the docker engine is
running), or against an external SFTP server.
@talkraghu
Copy link

Do we have any timeline by which the changes would delivered?

@talkraghu
Copy link

@tomaswolf , is there a tentative timeline to deliver this fix at newer library version ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants