This project provides integration modules for the invesdwin-context module system.
Releases and snapshots are deployed to this maven repository:
https://invesdwin.de/repo/invesdwin-oss-remote/
Dependency declaration:
<dependency>
<groupId>de.invesdwin</groupId>
<artifactId>invesdwin-context-integration-ws</artifactId>
<version>1.0.2</version><!---project.version.invesdwin-context-integration-parent-->
</dependency>
The core integration module resides in the invesdwin-context repository. You can read more about it there. The modules here build on top of it to provide more specific integration functionality:
- invesdwin-context-integration-amqp: this module allows to connect easily to a RabbitMQ (AMQP) server via spring-amqp. The channels are defined in your own spring-integration xml files as can be seen in the respective test cases. There are system properties available to define the connection parameters to the default server (create your own RabbitTemplate instances for additional connections):
de.invesdwin.context.integration.amqp.AmqpClientProperties.HOST=localhost
# -1 means to use default port
de.invesdwin.context.integration.amqp.AmqpClientProperties.PORT=-1
de.invesdwin.context.integration.amqp.AmqpClientProperties.USER=guest
de.invesdwin.context.integration.amqp.AmqpClientProperties.PASSWORD=guest
- invesdwin-context-integration-jms: this module allows to connect easily to an ActiveMQ (JMS server via spring-integration-jms. Currently this module is setup to connect processes in a local network of brokers, so right now there are no connection system properties available (but you can still create your own JmsTemplate instances for additional connections). Or just demand an extension to this module to support different operation modes and extended configurability via system properties.
- invesdwin-context-integration-batch: this module supports writing jobs for spring-batch. It eases integration by automatically populating the batch database tables using invesdwin-context-persistence-jpa which even works in test cases with an in memory database. See the included test cases for examples. Place job xml files into the classpath at
/META-INF/batch/ctx.batch.*.xml
so they get automatically discovered during application bootstrap. TheIJobService
provides convenience methods for running and managing these jobs. Theinvesdwin-context-integration-batch-admin
module embeds the spring-batch-admin web frontend into your application. It provides a convenient UI to manually fiddle around with your jobs on an ad-hoc basis (it uses the context path root at<WEBSERVER_BIND_URI>/
of your embedded web server). - invesdwin-context-integration-ws: this module adds support for RESTful web services via spring-web. Just define a bean and annotate it with
@Controller
for it to be automatically available under the context path<WEBSERVER_BIND_URI>/spring-web/...
of your embedded web server. Also this module adds in support for a web service registry via theIRegistryService
class which connects as a client to your own registry server (read more on that in the WS-Registry topic later). Lastly this module also adds support for SOAP web services via spring-ws. This allows for easy use of enterprise web services as spring integration channels (the context path is<WEBSERVER_BIND_URI>/spring-web/...
of your embedded web server). You can register your own web services in the registry server via the classesRestWebServicePublication
for REST andXsdWebServicePublication
for SOAP respectively. The classRegistryDestinationProvider
can be used to bind to an external web service that is looked up via the registry server. The following system properties are available to change your registry server url and to change the credentials for your web service security (message encryption is disabled per default since you should rather use an SSL secured web server (maybe even a proxy) for this as most other methods are slower and more complicated in comparison):
# it makes sense to put these settings into the $HOME/.invesdwin/system.properties file so they apply to all processes (alternatively override these in your distribution modules)
de.invesdwin.context.integration.ws.IntegrationWsProperties.REGISTRY_SERVER_URI=http://please.override.this/spring-web
de.invesdwin.context.integration.ws.IntegrationWsProperties.WSS_USERNAMETOKEN_USER=invesdwin
de.invesdwin.context.integration.ws.IntegrationWsProperties.WSS_USERNAMETOKEN_PASSWORD=invesdwin
de.invesdwin.context.integration.ws.IntegrationWsProperties.SPRING_WEB_USER=invesdwin
de.invesdwin.context.integration.ws.IntegrationWsProperties.SPRING_WEB_PASSWORD=invesdwin
- invesdwin-context-integration-ws-jaxrs: this module adds support for JAX-RS via the Jersey reference implementation. Just annotate your
@Named
spring beans additionally with@Path
to make expose them via JAX-RS as RESTful web services (the context path is<WEBSERVER_BIND_URI>/jersey/...
of your embedded web server). - invesdwin-context-integration-ws-jaxws: this module adds support for JAX-WS via the Apache CXF implementation. Just define your web services as you would normally do it with CFX and spring inside a context xml file that gets registered in an
IContextLocation
for the invesdwin-context application bootstrap (the context path is<WEBSERVER_BIND_URI>/cxf/...
of your embedded web server). - invesdwin-context-integration-ws-registry: this module provides an embedded registry server. Just create and deploy your own distribution (for an example see the
invesdwin-context-integration-ws-registry-dist
distribution module) and make sure all clients are configured to the appropriate server via thede.invesdwin.context.integration.ws.IntegrationWsProperties.REGISTRY_SERVER_URI
system property. The registry server is available at the context path<WEBSERVER_BIND_URI>/spring-web/registry/
of your embedded web server. - invesdwin-context-integration-maven: this module adds support for downloading artifacts from Maven repositories via Shrinkwrap Resolver. With this, one can implement dynamic loading of plugins via the web. The following system properties are supported:
de.invesdwin.context.integration.maven.MavenProperties.LOCAL_REPOSITORY_DIRECTORY=${user.home}/.m2/repository
de.invesdwin.context.integration.maven.MavenProperties.REMOTE_REPOSITORY_1_URL=http://invesdwin.de/artifactory/invesdwin-oss-remote
# add more remote repositories by incrementing the id (though keep them without gaps)
#de.invesdwin.context.integration.maven.MavenProperties.REMOTE_REPOSITORY_2_URL=
- invesdwin-context-integration-affinity: this module integrates a library to configure the Java-Thread-Affinity. It helps to decrease latency for communication threads when a cpu is reserved for a given task by making wakeups on events faster. Though in our experience this decrease in latency comes at a cost of throughput depending on the cpu and operating system, because cpu intensive workloads on fixed cores disable the heat and power consumption optimizations of the scheduler algorithm in the operating system. So always test your performance properly before applying a thread affinity. For this purpose you can use
AffinityProperties.setEnabled(boolean)
to enable/disable the affinity during runtime without having to remove the affinity setup code during testing. Please read the original documentation of the library to learn how to adjust your threads to actually apply the affinity. - invesdwin-context-integration-jppf: this is the client module for JPPF which is a library for grid computing in Java. It allows flexible topologies consisting of nodes (worker processes) and drivers (server which coordinates jobs to run on nodes) and employs a remote classloader connection to comfortably transmit jobs for execution on nodes without having to deploy code manually. This modules configures JPPF for speed and reliability out of the box. It also adds a basic authentication mechanism to prevent your grid from accepting connections from unauthorized sources. The multicast discovery is disabled for security reasons and instead the JPPF server discovery is handled via ws-registry lookups. The
ConfiguredJPPFClient
provides access to submitting jobs and among other things aJPPFProcessingThreadsCounter
which gives reliable information about how many drivers, nodes and processing threads are available in your grid. Also you can override any JPPF internal properties via the system properties override mechanisms provided by invesdwin-context. Though the only properties you should worry about changing would be the following ones:
# replace this with a secret token for preventing public access
de.invesdwin.context.integration.jppf.JPPFClientProperties.USERNAME_TOKEN=invesdwin
# enable local execution, disable it if you have a node running on this computer
jppf.local.execution.enabled=true
- invesdwin-context-integration-jppf-server: this module handles the setup of a JPPF driver instance. You can enable it in your unit tests via the
@JPPFServerTest
annotation. You need at least one server instance deployed in your grid. Though if multiple ones are deployed, they will add the benefit of improved reliability and more dynamic load balancing despite allowing more flexible control over the topology, since not all clients or nodes might be able to reach each other when multiple networks and firewalls are involved. In that case server instances can operate as bridges. The interesting properties here are:
# a local node reduces the overhead significantly, though you should disable it if a node is deployed on the same server or then when server should not execute jobs
jppf.local.node.enabled=true
# per default a random port will be picked
jppf.server.port=0
jppf.management.port=0
- invesdwin-context-integration-jppf-node: the node is a worker process that executes tasks which are bundled as jobs in JPPF. The configured load balancing mechanism will assign as many tasks per node as there are processing threads available for each. You can use the
@JPPFNodeTest
annotation to enable a node in your unit tests. The node will cycle through the servers that were returned by a ws-registry lookup and connect to the first one that can be reached properly. If you want to reduce the overhead of the remote classloader a bit you can either deploy whole jars with your jobs (via the JPPF ClassPath mechanism) or directly deploying the most common classes embedded in your nodes JVM classpath. If you require a spring context in your tasks, you have to initialize and tear it down inside the task properly. If you require access to invesdwin modules that are accessible via the MergedContext, then you have to deploy them alongside your node module or use an isolated classloader in your task which initializes its own context. Nodes will publish their metadata to an FTP server (those modules will be explained later) so that counting nodes and processing threads is a bit more reliable and works even with offline nodes (nodes that are behind firewalls and thus invisible through JMX or which are explicitly in offline mode). These counts can then be used for manually splitting workloads over multiple jobs. For example when you want to reduce the JPPF overhead (of remote classloading and context setup) or even the context switches of threads inside the nodes, you can submit jobs that have only one task that contains a chunked bit of work. This chunked bit of work will be split among the available threads in the node inside the task (manually by your code) and then processed chunk-wise without the thread needing to pick its next work item, since all of them were given to it in the beginning. The task waits for all the work to be finished in order to compress and upload the results as a whole (which also saves time when uploading results). Since JPPF nodes can only process one job at a time (though multiple tasks in a job are handled more efficiently than multiple jobs; this limitation is due to potential issues with static fields and native libraries you might encounter when mixing various different tasks without knowing what they might do), this allows you more fine grained control to extract the most performance from your nodes depending on your workload. When you submit lots of jobs from your client, it will make sure that there are enough connections available for each job by scaling connections up dynamically within limits (because each job requires its separate connection in JPPF). This solves some technical limitations of JPPF for the benefit of staying flexible. - invesdwin-context-integration-jppf-admin(-web): JPPF also provides an Admin Console UI which is integrated via this module. It shows health statistics on the servers and nodes and allows to manage their settings. Use the appropriate module to launch a desktop or web frontend.
- invesdwin-context-integration-ftp: here you find ftp4j as a client library for FTP access. Usage is simplified by
FtpFileChannel
which provides a way to transmit information for a communication channel for FTP transfers by serializing the object. This is useful for JPPF because you can get faster result upload speeds by uploading to FTP and sending theFtpFileChannel
as a result of the computation to the client, which then downloads the results from the FTP server. By utilizingAsyncFileChannelUpload
you can even make this process asynchronous which enables the JPPF computation node to continue with the next task while uploading the results in parallel. The client which utilizesAsyncFileChannelDownload
will wait until the upload is finished by the node to start the download. Timeouts and retries are being applied to handle any failures that might happen. Though if this fails completely, the client should resubmit the job. The following system properties are available to configure the FTP credentials (you can override theFtpFileChannel.login()
method to use different credentials; FTP server discovery is supposed to happen viaFtpServerDestinationProvider
as a ws-registry lookup):
de.invesdwin.context.integration.ftp.FtpClientProperties.USERNAME=invesdwin
de.invesdwin.context.integration.ftp.FtpClientProperties.PASSWORD=invesdwin
- invesdwin-context-integration-ftp-server: this is an embedded FTP server which is provided by Apache MINA FtpServer. As usual you can annotate your tests with
@FtpServerTest
to enable the server in your unit tests. The following system properties are available:
de.invesdwin.context.integration.ftp.server.FtpServerProperties.PORT=2221
de.invesdwin.context.integration.ftp.server.FtpServerProperties.MAX_THREADS=200
# set to clean the server directory regularly of old files, keep empty or unset to disable this feature
de.invesdwin.context.integration.ftp.server.FtpServerProperties.PURGE_FILES_OLDER_THAN_DURATION=1 DAYS
- invesdwin-context-integration-webdav: since FTP has a high protocol overhead when being used for lots of short lived connections, we also provide a more lightweight alternative via WebDAV for file transmissions. It works over HTTP so it is designed for low protocol overhead since it reduces the amount of round trips needed and does not require a socket connection to be kept alive between requests. The client library in use is sardine which is available as a
WebdavFileChannel
for use inAsyncFileChannelUpload
andAsyncFileChannelDownload
. The following system properties are available to configure the WebDAV credentials (you can override theWebdavFileChannel.login()
method to use different credentials; WebDAV server discovery is supposed to happen viaWebdavServerDestinationProvider
as a ws-registry lookup):
de.invesdwin.context.integration.webdav.WebdavClientProperties.USERNAME=invesdwin
de.invesdwin.context.integration.webdav.WebdavClientProperties.PASSWORD=invesdwin
- invesdwin-context-integration-webdav-server: this is an embedded WebDAV server which is provided by WebDAV-Handler. It is a simple implementation that also provides support for advanced WebDAV features like file locking. As usual you can annotate your tests with
@WebserverTest
when usinginvesdwin-context-webserver
to enable the server in your unit tests (the context path is<WEBSERVER_BIND_URI>/webdav/
of your embedded web server). The following system properties are available:
# set to clean the server directory regularly of old files, keep empty or unset to disable this feature
de.invesdwin.context.integration.webdav.server.WebdavServerProperties.PURGE_FILES_OLDER_THAN_DURATION=1 DAYS
- invesdwin-context-integration-mpi(-fastmpj|-mpjexpress|-openmpi|-mvapich2): These modules provide support for running jobs on MPI (Message Passing Interface) clusters/grids.
ProvidedMpiAdapter.getProvidedInstance()
automatically detects the environment (FastMPJ, MPJExpress, OpenMPI, MVAPICH2) depending on which adapter modules are on the classpath. The actual MPI classes are provided by the runtime where the jobs have been launched into and the provider will determine the correct adapter to be used. Thus it is fine to have all submodules in the classpath to make the binding decision on runtime automagically. The IMpiAdapter is an abstraction for stay flexible about which implementation to bind to.- FastMPJ requires a special license and might not be appropriate for most cases (we only tested NIO based communication). MPJExpress is an open source Java based alternative that supports multithreaded jobs on target machines. It supports both shared memory and NIO based communication and selects the best alternative. There is also support for Myrinet and native MPI integration, though we did not test those. A native implementation like OpenMPI or MVAPICH2 is the preferred and fastest solution for native access to specialized hardware. Though, MPJExpress also shines with its support to run jobs on Hadoop/YARN clusters (for cloud based distributed jobs).
- Examples and testcases are available in
invesdwin-context-integration-mpi-test
. There we create a Fat-Jar from the running process, invoke the FastMPJ/MPJExpress/OpenMPI/MVAPICH2 launcher to execute the packaged job on a local cluster and wait for the job to finish. For Hadoop/YARN we test with a local Docker instance and provide an example for a Jailbreak to upgrade job executions from Java 8 or 11 (supported by Hadoop as of Q1 2023) to Java 17 (required by Spring 6 and Jakarta). - MPI allows to synchronize processes using barriers and to communicate via non-blocking send/recv as well as broadcast messages. Message transfer is implemented as Synchronous Channel implementations which are described in more detail below. Advanced message patterns need to be implemented on top of Synchronous Channels. Though it is also possible to directly access a given MPI implementation if a job is directly binding against that specific API (sadly all alternatives use the same package and class names but are not compatible to each other) to use the MPI integrated advanced message patters with their drawback of requiring equal length messages (which Synchronous Channels do not require).
- If packaging a jar and uploading that for each job is too much overhead and introduces delays in processing, consider using JPPF instead with its remote classloader. It might be even possible to deploy JPPF nodes on the MPI cluster for a while and using those to execute smaller jobs in a higher frequency.
Quote from golang: Do not communicate by sharing memory; instead, share memory by communicating.
The abstraction in invesdwin-context-integration-channel has its origin in the invesdwin-context-persistence-timeseries module. Though it can be used for inter-network, inter-process and inter-thread communication. The idea is to use the channel implementation that best fits the task at hand (heap/off-heap, by-reference/by-value, blocking/non-blocking, bounded/unbounded, queued/direct, durable/transient, publish-subscribe/peer-to-peer, unicast/multicast, client/server, master/slave) with the transport that is most efficient or useful at that spot. To switch to a different transport, simply replace the channel implementation without having to modify the code that uses the channel. Best effort was made to keep the implementations zero-copy and zero-allocation (where possible). If you find ways to further improve the speed, let us know!
- ISynchronousChannel: since LevelDB by design supports only one process that accesses the database (even though it does multithreading properly) the developers say one should build their own server frontend around the database to access it from other processes. This makes also sense in order to have only one process that keeps the database updated with expiration checks and a single instance of
ATimeSeriesUpdater
. Though this creates the problem of how to build an IPC (Inter-Process-Communication) mechanism that does not slow down the data access too much. As a solution, this module provides a unidirectional channel (you should use one channel for requests and a separate one for responses on a peer to peer basis) for blazing fast IPC on a single computer with the following tools:- Named Pipes: the classes
PipeSynchronousReader
andPipeSynchronousWriter
provide implementations for classic FIFO/Named PipesSynchronousChannels.createNamedPipe(...)
creates pipes files with the command line toolmkfifo
/mknod
on Linux/MacOSX, returns false when it failed to create one. On linux you have to make sure to open each pipe with a reader/writer in the correct order on both ends or else you will get a deadlock because it blocks. The correct way to do this is to first initialize the request channel on both ends, then the response channel (or vice versa).SynchronousChannels.isNamedPipeSupported(...)
tells you if those are supported so you can fallback to a different implementation e.g. on Windows (though you could write your own mkfifo.exe/mknod.exe implementation and put it into PATH to make this work with Windows named pipes. At least currently there is no command line tool directly available for this on Windows, despite makepipe which might be bundled with some versions of SQL Server, but we did not try to implement this because there is a better alternative to named pipes below that works on any operating system)- this implementation only supports one reader and one writer per channel, since all data gets consumed by the reader
- Memory Mapping: the classes
MappedSynchronousReader
andMappedSynchronousWriter
provide channel implementations that are based on memory mapped files (inspired by Chronicle-Queue and MappedBus while making it even simpler for maximum speed). This implementation has the benefit of being supported on all operating systems. It has less risk of accidentally blocking when things go really wrong because it is non-blocking by nature. And this implementation is slightly faster than Named Pipes for the workloads we have tested, but your mileage might vary and you should test it yourself. Anyway you should still keep communication to a minimum, thus making chunks large enough during iteration to reduce synchronization overhead between the processes, for this a bit of fine tuning on your communication layer might be required.- for determining where to place the memory mapped files:
SynchronousChannels.getTmpfsFolderOrFallback()
gives the/dev/shm
folder when available or the temp directory as a fallback for e.g. Windows. Using tmpfs could improve the throughput a bit, since the OS will then not even try to flush the memory mapped file to disk. - this implementation supports only one writer and theoretically supports multiple readers for a publish/subscribe or multicast scenario of pushing data to clients (e.g. a price feed). But it does not synchronize the reads and the writer might overwrite data while one of the readers still reads it. You would have to implement some sort of synchronization between the processes to prevent this. Another problem might be the synchronous nature of there always being only one message in the channel, clients might miss a few messages if they are not fast enough. A better solution for use-cases like this would be to use MappedBus or Chronicle-Queue directly. Those solutions provide more than unidirectional peer to peer channels and can synchronize multiple readers and writers. For comparison, this implementation does not want to pay the overhead price of synchronization and aims to provide the fastest unidirectional peer to peer channel possible.
- for determining where to place the memory mapped files:
- Native Socket: the classes
NativeSocketSynchronousReader
andNativeSocketSynchronousWriter
provide implementations for TCP/IP socket communication for when you actually have to go across computers on a network. If you want to use UDP/IP you can use the classesNativeDatagramSynchronousReader
andNativeDatagramSynchronousWriter
, but beware that packets can get lost when using UDP, so you need to implement retry behaviours on your client/server. It is not recommended to use these implementations for IPC on localhost, since they are quite slow in comparison to other implementations. There are also Netty variants of these channels so you can use Epoll, KQueue and IOUring if you want. The goal of these implementations is to use as little Java code as possible for lowering latency with zero garbage and zero copy. They are suitable to be accelerated using special network cards that support kernel bypass with OpenOnload or its potentially faster alternatives. This might lead to an additional 60%-80% latency reduction. Aeron could also be used with the native MediaDriver together with kernel bypass for reliable low latency UDP transport. The Java process talks to the MediaDriver via shared memory, which removes JNI overhead (at the cost of thread context switching. Specialized high performance computing (HPC) hardware (buzzwords like Infiniband, RMA, RDMA, PGAS, Gemini/Aries, Myrinet, NVLink, iWarp, RoCE, OpenFabrics, ...) can be intergrated with channels for OpenMPI, MpjExpress, JUCX and DISNI. IPoIB provides a kernel driver for sockets over Infiniband, but this is slower due to requiring syscalls like normal sockets. Kernel bypass latency reductions for Infiniband can only be achieved by our specialized transport integrations. For server loads it might be better to use the standard asynchronous handler threading model of Netty viaNettySocketAsynchronousChannel
orNettyDatagramAsynchronousChannel
using anIAsynchronousHandler
. This should scale to thousands of connections with high throughput. The SynchronousChannels are designed for high performance peer-to-peer connections, but they are also non-blocking and multiplexing capable. This should provide similar scalability to Netty without restrictions on the underlying transport implementation. Our unified channel abstraction makes higher level code (like protocol and topology) fully reusable. - Queue: the classes
QueueSynchronousReader
andQueueSynchronousWriter
provide implementations for non-blockingjava.util.Queue
usage for inter-thread communication inside the JVM. The classesBlockingQueueSynchronousReader
andBlockingQueueSynchronousWriter
provide an alternative forjava.util.concurrent.BlockingQueue
implementations likeSynchronousQueue
which do not support non-blocking usage. Be aware that not all queue implementations are thread safe, so you might want to synchronize the channels viaSynchronousChannels.synchronize(...)
. The various Disruptor implementations (LMAX, Conversant) might work better well for high concurrency loads, though in our testsAgronaOneToOneQueue/AgronaManyToOneQueue/AgronaManyToManyQueue
seem to perfom best for inter-thread communication.
- Named Pipes: the classes
- ASpinWait: the channel implementations are non-blocking by themselves (Named Pipes are normally blocking, but the implementation uses them in a non-blocking way, while Memory Mapping is per default non-blocking). This causes the problem of how one should wait on a reader without causing delays from sleeps or causing CPU load from spinning. This problem is solved by
ASpinWait
. It first spins when things are rolling and falls back to a fast sleep interval when communication has cooled down (similar to whatjava.util.concurrent.SynchronousQueue
does between threads). The actual timings can be fully customized. To keep CPU usage to a minimum, spins can be disabled entirely in favour of waits/sleeps. To use this class, just override theisConditionFulfilled()
method by callingreader.hasNext()
. - Performance: here are some performance measurements against in-process queue implementations using one channel for requests and another separate one for responses, thus each record (of 10,000,000), each involving two messages (becoming 20,000,000), is passed between two threads (one simulating a server and the other one a client). This shows that memory mapping might even be useful as a faster alternative to queues for inter-thread-communication besides it being designed for inter-process-communication (as long as the serialization is cheap):
Old Benchmarks (2016, Core i7-4790K with SSD, Java 8):
DatagramSocket (loopback) Records: 111.01/ms in 90.078s => ~60% slower than Named Pipes
ArrayDeque (synced) Records: 127.26/ms in 78.579s => ~50% slower than Named Pipes
Named Pipes Records: 281.15/ms in 35.568s => using this as baseline
SynchronousQueue Records: 924.90/ms in 10.812s => ~3 times as fast as Named Pipes
LinkedBlockingQueue Records: 1,988.47/ms in 5.029s => ~7 times as fast as than Named Pipes
Mapped Memory Records: 3,214.40/ms in 3.111s => ~11 times as fast as than Named Pipes
Mapped Memory (tmpfs) Records: 4,237.29/ms in 2.360s => ~15 times as fast as than Named Pipes
New Benchmarks (2021, Core i9-9900k with SSD, Java 17, mitigations=off; best in class marked by *
):
Network NettyDatagramOio (loopback) Records: 1.00/s => ~99.99% slower (threading model unsuitable)
Network NettySocketOio (loopback) Records: 1.01/s => ~99.99% slower (threading model unsuitable)
Network NettyUdt (loopback) Records: 51.83/s => ~99.95% slower
Network JNetRobust (loopback) Records: 480.42/s => ~99.54% slower
Network BarchartUdt (loopback) Records: 808.25/s => ~99.24% slower
Network AsyncNettyUdt (loopback) Records: 859.93/s => ~99.19% slower
Network BidiNettyUdt (loopback) Records: 5.27/ms => ~95% slower
Network NngTcp (loopback) Records: 13.33/ms => ~87% slower
Network BidiMinaSctpApr (loopback) Records: 27.43/ms => ~74% slower
Thread NngInproc Records: 28.93/ms => ~73% slower
Network NettySocketIOUring (loopback) Records: 30.49/ms => ~71% slower
Network MinaSocketNio (loopback) Records: 32.41/ms => ~70% slower
Network BidiMinaSocketApr (loopback) Records: 32.96/ms => ~69% slower
Network NettySocketNio (loopback) Records: 34.74/ms => ~67% slower
Network CzmqTcp (loopback) Records: 35.24/ms => ~67% slower
Network JnanomsgTcp (loopback) Records: 35.90/ms => ~66% slower
Network AsyncMinaSctpApr (loopback) Records: 37.99/ms => ~64% slower (using async handlers for servers)
Network BidiMinaSocketNio (loopback) Records: 38.21/ms => ~64% slower
Network BidiBarchartUdt (loopback) Records: 38.40/ms => ~64% slower
Network JzmqTcp (loopback) Records: 39.37/ms => ~63% slower
Network NettyDatagramIOUring (loopback) Records: 40.24/ms => ~62% slower
Process JeromqIpc Records: 41.13/ms => ~61% slower
Network BidiNettyDatagramIOUring (loopback) Records: 41.96/ms => ~61% slower
Network BidiNettySocketNio (loopback) Records: 43.18/ms => ~59% slower
Network JeromqTcp (loopback) Records: 43.88/ms => ~59% slower
Process CzmqIpc Records: 45.13/ms => ~58% slower
Network FastMPJ (loopback) Records: 45.61/ms => ~57% slower
Network BidiNettyDatagramNio (loopback) Records: 47.84/ms => ~55% slower
Network NettyDatagramNio (loopback) Records: 49.04/ms => ~54% slower
Network MPJExpress (loopback) Records: 49.45/ms => ~54% slower
Network AsyncMinaSocketNio (loopback) Records: 53.45/ms => ~50% slower (using async handlers for servers)
Network AsyncNettySocketIOUring (loopback) Records: 54.06/ms => ~49% slower (using async handlers for servers)
Network BidiNettySocketHadroNio (loopback) Records: 55.29/ms => ~48% slower
Process JzmqIpc Records: 55.35/ms => ~48% slower
Network BidiMinaDatagramNio (loopback) Records: 55.51/ms => ~48% slower (unreliable)
Network AsyncNettySocketOio (loopback) Records: 58.76/ms => ~45% slower (using async handlers for servers)
Network AsyncNettyDatagramIOUring (loopback) Records: 59.29/ms => ~44% slower (unreliable; using async handlers for servers)
Network NettySocketHadroNio (loopback) Records: 59.44/ms => ~44% slower
Process JnanomsgIpc Records: 61.36/ms => ~42% slower
Network KryonetTcp (loopback) Records: 62.71/ms => ~41% slower
Network HadroNioSocketChannel (loopback) Records: 65.63/ms => ~38% slower
Network AsyncNettySocketNio (loopback) Records: 65.86/ms => ~38% faster (using async handlers for servers)
Network BidiSctpChannel (loopback) Records: 66.23/ms => ~38% slower
Network BidiNativeSctpChannel (loopback) Records: 70.97/ms => ~33% slower
Network BidiHadroNioSocketChannel (loopback) Records: 71.77/ms => ~33% slower
Network AsyncMinaDatagramNio (loopback) Records: 73.88/ms => ~31% slower (unreliable; using async handlers for servers)
Network AsyncNettyDatagramOio (loopback) Records: 74.31/ms => ~30% slower (unreliable; using async handlers for servers)
Network AsyncNettySocketHadroNio (loopback) Records: 74.81/ms => ~30% slower (using async handlers for servers)
Network AsyncNettyDatagramNio (loopback) Records: 75.58/ms => ~29% slower (unreliable; using async handlers for servers)
Network BlockingBidiDisniActive (SoftiWarp) Records: 78.47/ms => ~26% slower
Network SctpChannel (loopback) Records: 81.78/ms => ~23% slower
Network BlockingDisniActive (SoftiWarp) Records: 84.22/ms => ~21% slower
Network BlockingDisniActive (SoftRoCE) Records: 84.76/ms => ~20% slower (unreliable, crashes after ~1 million records)
Network BidiNettySocketHadroNio (SoftRoCE) Records: 84.88/ms => ~20% slower (unreliable)
Network NativeSctpChannel (loopback) Records: 86.52/ms => ~19% slower
Network BidiNativeMinaSctpApr (loopback) Records: 87.81/ms => ~18% slower
Network KryonetUdp (loopback) Records: 87.85/ms => ~18% slower
Network BlockingBidiDisniActive (SoftRoCE) Records: 88.53/ms => ~17% slower (unreliable, crashes after ~1 million records)
Network NettySocketEpoll (loopback) Records: 91.64/ms => ~14% slower
Network NettySocketHadroNio (SoftRoCE) Records: 95.12/ms => ~11% slower (unreliable, crashes after ~500k records)
Network JucxTag (loopback, PEH) Records: 97.50/ms => ~8% slower (PeerErrorHandling, supports Infiniband and other specialized hardware)
Network BidiJucxTag (loopback, PEH) Records: 98.78/ms => ~7% slower (PeerErrorHandling)
Network JucxStream (loopback, PEH) Records: 99.88/ms => ~6% slower (PeerErrorHandling)
Network BidiJucxStream (loopback, PEH) Records: 101.25/ms => ~5% slower (PeerErrorHandling)
Network BidiHadroNioSocketChannel (SoftRoCE) Records: 104.23/ms => ~2% slower (unreliable)
Network HadroNioSocketChannel (SoftRoCE) Records: 105.47/ms => ~1% slower (unreliable)
Network BlockingNioSocket (loopback) Records: 106.51/ms => using this as baseline
Network AsyncNettySocketHadroNio (SoftRoCE) Records: 109.32/ms => ~3% faster (unreliable)
Network BlockingHadroNioSocket (loopback) Records: 112.12/ms => ~5% faster
Network BidiNettySocketEpoll (loopback) Records: 114.18/ms => ~7% faster
Network BidiNettyDatagramEpoll (loopback) Records: 114.47/ms => ~7% faster (unreliable)
Network EnxioSocketChannel (loopback) Records: 115.72/ms => ~9% faster
Thread JnanomsgInproc Records: 117.81/ms => ~11% faster
Network NettyDatagramEpoll (loopback) Records: 119.89/ms => ~13% faster (unreliable)
Network BlockingNioDatagramSocket (loopback) Records: 124.49/ms => ~17% faster (unreliable)
Network BidiBlockingNioDatagramSocket (loopback) Records: 127.86/ms => ~20% faster (unreliable)
Thread CzmqInproc Records: 132.52/ms => ~24% faster
Network NioSocketChannel (loopback) Records: 136.40/ms => ~28% faster
Network ChronicleNetwork (loopback) Records: 136.54/ms => ~28% faster (using UnsafeFastJ8SocketChannel)
Network NativeNettySocketEpoll (loopback) Records: 137.21/ms => ~29% faster
Network JucxStream (SoftRoCE, PEH) Records: 137.22/ms => ~29% faster (PeerErrorHandling, unreliable, crashes after ~5 million records)
Network NativeSocketChannel (loopback) Records: 138.75/ms => ~30% faster (faster than NativeNetty because no JNI?)
Network AsyncNettyDatagramEpoll (loopback) Records: 139.22/ms => ~31% faster (unreliable; using async handlers for servers)
Network BidiDisniActive (SoftiWarp) Records: 146.68/ms => ~38% faster
Network BidiDisniPassive (SoftiWarp) Records: 150.68/ms => ~41% faster
Network AsyncNettySocketEpoll (loopback) Records: 153.22/ms => ~44% faster (using async handlers for servers)
Network BidiBlockingHadroNioSocket (loopback) Records: 155.16/ms => ~46% faster
Network BidiBlockingNioSocket (loopback) Records: 158.88/ms => ~49% faster
Process ChronicleQueue Records: 164.71/ms => ~55% faster
Process ChronicleQueue (tmpfs) Records: 164.79/ms => ~55% faster
Network DisniActive (SoftiWarp) Records: 182.08/ms => ~71% faster
Network DisniPassive (SoftiWarp) Records: 186.40/ms => ~75% faster
Network BidiJucxStream (SoftRoCE, PEH) Records: 189.48/ms => ~78% faster (PeerErrorHandling, unreliable, crashes after ~5 million records)
Network BidiNativeMinaSocketApr (loopback) Records: 200.83/ms => ~88% faster
Network BidiJucxTag (SoftRoCE, PEH) Records: 203.85/ms => ~91% faster (PeerErrorHandling, unreliable, crashes after ~5 million records)
Network BidiEnxioSocketChannel (loopback) Records: 205.65/ms => ~93% faster
Network BidiNioSocketChannel (loopback) Records: 208.31/ms => ~95% faster
Network BidiChronicleNetwork (loopback) Records: 212.36/ms => ~99% faster (using UnsafeFastJ8SocketChannel)
Network* BidiNativeSocketChannel (loopback) Records: 212.72/ms => ~99% faster
Network JucxTag (SoftRoCE, PEH) Records: 214.42/ms => ~2 times as fast (PeerErrorHandling, unreliable, crashes after ~5 million records)
Network* AeronUDP (loopback) Records: 216.57/ms => ~2 times as fast (reliable and supports multicast)
Network BidiDisniActive (SoftRoCE) Records: 232.49/ms => ~2.2 times as fast (unreliable, crashes after ~1 million records)
Network DisniActive (SoftRoCE) Records: 243.85/ms => ~2.3 times as fast (unreliable, crashes after ~1 million records)
Network BidiDisniPassive (SoftRoCE) Records: 245.15/ms => ~2.3 times as fast (unreliable, crashes after ~1 million records)
Network BidiNativeNettyDatagramEpoll (loopback) Records: 259.74/ms => ~2.4 times as fast (unreliable)
Network NativeNettyDatagramEpoll (loopback) Records: 263.63/ms => ~2.5 times as fast (unreliable)
Network BidiNioDatagramChannel (loopback) Records: 277.80/ms => ~2.6 times as fast (unreliable)
Network NioDatagramChannel (loopback) Records: 286.81/ms => ~2.7 times as fast (unreliable)
Network BidiNativeDatagramChannel (loopback) Records: 294.35/ms => ~2.8 times as fast (unreliable; faster than NativeNetty because no JNI?)
Thread BidiMinaVmPipe Records: 298.17/ms => ~2.8 times as fast
Network MinaNativeDatagramChannel (loopback) Records: 306.79/ms => ~2.9 times as fast (unreliable)
Network NativeDatagramChannel (loopback) Records: 308.10/ms => ~2.9 times as fast (unreliable; faster than NativeNetty because no JNI?)
Network DisniPassive (SoftRoCE) Records: 320.18/ms => ~3 times as fast (unreliable, crashes after ~1 million records)
Thread JeromqInproc Records: 323.88/ms => ~3 times as fast
Process NamedPipe (Streaming) Records: 351.87/ms => ~3.3 times as fast
Thread MinaVmPipe Records: 372.10/ms => ~3.5 times as fast
Process UnixDomainSocket Records: 429.44/ms => ~4 times as fast (requires Java 16 for NIO support)
Process NativeUnixDomainSocket Records: 433.28/ms => ~4.1 times as fast (requires Java 16 for NIO support)
Process BidiUnixDomainSocket Records: 451.21/ms => ~4.2 times as fast (requires Java 16 for NIO support)
Process BidiNativeUnixDomainSocket Records: 460.59/ms => ~4.3 times as fast (requires Java 16 for NIO support)
Process NamedPipe (FileChannel) Records: 528.44/ms => ~5 times as fast
Network BidiJucxTag (noPEH) Records: 536.55/ms => ~5 times as fast (NoPeerErrorHandling, most likely using shared memory)
Network JucxStream (noPEH) Records: 538.19/ms => ~5 times as fast (NoPeerErrorHandling, most likely using shared memory)
Network JucxTag (noPEH) Records: 539.70/ms => ~5.1 times as fast (NoPeerErrorHandling, most likely using shared memory)
Process MPJExpress (shared memory) Records: 550.26/ms => ~5.2 times as fast
Network BidiJucxStream (noPEH) Records: 553.75/ms => ~5.2 times as fast (NoPeerErrorHandling, most likely using shared memory)
Process NamedPipe (Native) Records: 603.46/ms => ~5.7 times as fast
Thread LockedReference Records: 931.74/ms => ~8.7 times as fast
Process OpenMPI (shared memory) Records: 1,024.63/ms => ~9.6 times as fast (supports Infiniband and other specialized hardware)
Process Jocket Records: 1,204.82/ms => ~11.3 times as fast (unstable; deadlocks after 2-3 million messages; their tests show ~1792.11/ms which would be ~16.8 times faster; had to test on Java 8 )
Thread LinkedBlockingDeque Records: 1,520.45/ms => ~14.3 times as fast
Thread ArrayBlockingQueue Records: 1,535.72/ms => ~14.4 times as fast
Thread LinkedBlockingQueue Records: 1,716.12/ms => ~16.1 times as fast
Thread ArrayDeque (synced) Records: 1,754.14/ms => ~16.4 times as fast
Thread SynchronousQueue Records: 2,207.46/ms => ~20.7 times as fast
Thread SynchronizedReference Records: 2,240.90/ms => ~21 times as fast
Thread LmaxDisruptorQueue Records: 2,710.39/ms => ~25.4 times as fast
Thread LinkedTransferQueue Records: 2,812.62/ms => ~26.4 times as fast
Thread ConcurrentLinkedDeque Records: 2,844.95/ms => ~26.7 times as fast
Process Blocking Mapped Memory Records: 3,140.80/ms => ~29.5 times as fast
Process Blocking Mapped Memory (tmpfs) Records: 3,229.97/ms => ~30.3 times as fast
Thread LmaxDisruptor Records: 3,250.13/ms => ~30.5 times as fast
Thread ConversantDisruptorConcurrent Records: 3,389.03/ms => ~31.8 times as fast
Thread ConversantDisruptorBlocking Records: 3,498.95/ms => ~32.85 times as fast
Thread AgronaManyToOneRingBuffer Records: 3,510.74/ms => ~32.96 times as fast
Thread ConversantPushPullBlocking Records: 3,628.18/ms => ~34 times as fast
Thread JctoolsSpscLinkedAtomic Records: 3,762.94/ms => ~35.3 times as fast
Thread ConversantPushPullConcurrent Records: 3,777.29/ms => ~35.5 times as fast
Thread JctoolsSpscLinked Records: 3,885.46/ms => ~36.5 times as fast
Thread AtomicReference Records: 3,964.79/ms => ~37.2 times as fast
Thread NettySpscQueue Records: 3,995.21/ms => ~37.5 times as fast
Thread NettyMpscQueue Records: 4,086.30/ms => ~38.4 times as fast
Thread NettyFixedMpscQueue Records: 4,141.13/ms => ~38.88 times as fast
Thread AgronaManyToOneQueue Records: 4,178.33/ms => ~39.2 times as fast
Process AeronIPC Records: 4,209.11/ms => ~39.5 times as fast
Thread AgronaBroadcast Records: 4,220.66/ms => ~39.6 times as fast
Thread AgronaManyToManyQueue Records: 4,235.49/ms => ~39.8 times as fast
Thread JctoolsSpscArray Records: 4,337.64/ms => ~40.7 times as fast
Thread JctoolsSpscAtomicArray Records: 4,368.72/ms => ~41 times as fast
Thread* AgronaOneToOneQueue Records: 4,466.68/ms => ~42 times as fast (works with plain objects)
Thread VolatileReference Records: 4,733.50/ms => ~44.4 times as fast
Thread AgronaOneToOneRingBuffer (SafeCopy) Records: 4,831.85/ms => ~45.3 times as fast
Thread AgronaOneToOneRingBuffer (ZeroCopy) Records: 4,893.33/ms => ~45.9 times as fast
Process Mapped Memory Records: 6,521.46/ms => ~61.2 times as fast
Process* Mapped Memory (tmpfs) Records: 6,711.41/ms => ~63 times as fast
- Dynamic Client/Server: you could utilize (e.g.) RMI with its service registry on localhost (or something similar) to make processes become master/slave dynamically with failover when the master process exits. Just let each process race to become the master (first one wins) and let all other processes fallback to being slaves and connecting to the master. The RMI service provides mechanisms to setup the synchronous channels (by handing out pipe files) and the communication will then continue faster via your chosen channel implementation (RMI is slower because it uses the default java serialization and the TCP/IP communication causes undesired overhead). When the master process exits, the clients should just race again to get a new master nominated. To also handle clients disappearing, one should implement timeouts via a heartbeat that clients regularly send to the server to detect missing clients and a response timeout on the client so it detects a missing server. This is just for being bullet-proof, the endpoints should normally notify the other end when they close a channel, but this might fail when a process exits abnormally (see SIGKILL).
- Cryptography: there are some channel implementations with which invesdwin-context-security-crypto encryption (
StreamEncryptionChannelFactory
for e.g. AES) and verification (StreamVerifiedEncryptionChannelFactory
for e.g. AES+HMAC;StreamVerifiedChannelFactory
for checksums, digests, macs, or signatures without encryption) can be added to the communication. There is also aHandshakeChannelFactory
with providers for secure key exchanges using DH, ECDH, JPake, or SRP6 to negotiate the encryption and verification channels automatically. There is also a TLS and DTLS provider using SSLEngine (JDK and Netty(-TcNative)) that can be used with any underlying transport (not only TCP or UDP). Here some benchmarks with various security providers (2022, Core i9-12900HX with SSD, Java 17, handshake not included in measured time, measuring records or round trips per millisecond, so multiply by 2 to get the messages per millisecond):
JDK17 | Conscrypt | AmazonCorretto | BouncyCastle | WildflyOpenSSL | Netty-TcNative | Commons-Crypto | |
---|---|---|---|---|---|---|---|
BidiNativeSocketChannel (loopback, baseline) | 366.90/ms | - | - | - | - | - | - |
AES128CTR | 294.94/ms | 262.69/ms | 299.63/ms | 185.91/ms | - | - | - |
AES256CTR | 288.61/ms | 260.01/ms | 298.26/ms | 172.17/ms | - | - | - |
AES128CTR+HMAC256 | 223.90/ms | 87.72/ms | 208.22/ms | 109.66/ms | - | - | - |
AES256CTR+HMAC256 | 219.93/ms | 88.50/ms | 211.86/ms | 106.32/ms | - | - | - |
AES128CTR+HMAC512 | 146.79/ms | 61.19/ms | 139.74/ms | 94.58/ms | - | - | - |
AES256CTR+HMAC512 | 147.28/ms | 60.66/ms | 140.93/ms | 90.18/ms | - | - | - |
AES128+GCM | 215.47/ms | 203.45/ms | Input too short - need tag | 149.60/ms | - | - | - |
AES256+GCM | 210.28/ms | 196.47/ms | Input too short - need tag | 135.45/ms | - | - | - |
TLS3.0 | 181.08/ms | Garbled Bytes | PKIX Validation Failed | 108.52/ms | 101.24/ms | Conscrypt Garbled Bytes; OpenSSL Binding Outdated | OpenSSL Binding Outdated |
SECURITY: You can apply extended security features by utilizing the invesdwin-context-security-web module which allows you to define your own rules as described in the spring-security namespace configuration documentation. Just define your own rules in your own <http>
tag for a specific context path in your own custom spring context xml and register it for the invesdwin application bootstrap to be loaded as normally done via an IContextLocation
bean.
DEPLOYMENT: The above context paths are the defaults for the embedded web server as provided by the invesdwin-context-webserver
module. If you do not want to deploy your web services, servlets or web applications in an embedded web server, you can create distributions that repackage them as WAR files (modifications via maven-overlays or just adding your own web.xml) to deploy in a different server under a specific context path. Or you could even repackage them as an EAR file and deploy them in your application server of choice. Or if you like you could roll your own alternative embedded web server that meets your requirements. Just ask if you need help with any of this.
If you need further assistance or have some ideas for improvements and don't want to create an issue here on github, feel free to start a discussion in our invesdwin-platform mailing list.