Kernel-User Space Transport #77

krizhanovsky · 2015-03-12T12:18:19Z

Motivation and architecture

We need to export some logic to user-space and/or third-party servers. User-space tasks must be done asynchronously to softirq processing, just like NF_QUEUE for the netfilter. Examples are:

FastCGI, uwsgi dynamic content (including our own RESTful interface for reconfiguration);
ICAP
WAF and DDoS mitigation machine learning heavy logic. Many WAF PoC are developed using OpenResty and soon faces performance issues: partially this an be solved with WAF acceleration and HTTPtables, but ability to run e.g. PyPy code calling fast HTTP message processing logic written in C will be a good alternative to LuaJIT.
Data compression and decompression;
Serverless computations like Fastly Compute@Edge, Cloudflare Workers, and Akamai EdgeWorkers

FastCGI, uwsgi and ICAP implement their own protocols, different from HTTP. All the logic above should not be considered as core or mission critical.

Thus, we should be able to pass some HTTP requests to user space for complex processing and get appropriate responses from user space. Configuration option (like HTTP scheduler using http_match):

    match user_space_offload uri prefix "/rest/";

should be used to pass a client request to user space processing daemon.

Zero-copy transport of HTTP messages between kernel and user spaces must be used. It should be done based on mmap() interface for parsed HTTP messages. The proposed scenario for processing a ingress HTTP request and sending a generated HTTP response is illustrated by the figure at the below.

Softirq handler receives packets that hold an HTTP request. The Linux TCP/IP stack is patched so that the packet’s payload is always placed in memory pages, which can be mmap()’ed.
The request is parsed and all required data, including the parsing meta-information and the packet’s data, are placed in several memory pages. HTTP messages are processed in a zero-copy fashion, i.e. HTTP fields are not copied. Instead, appropriate pointers are stored in the parsing meta-information which point into the received packet data, like the start of HTTP header field name and value.
When memory pages of HTTP request are mapped to the user space process’ address space, the softirq handler wakes up the process.
Now the process can run heavy logic on the mmap()’ed request. An example of heavy logic could could be data compression.
The advanced classification process can generate a response to it (e.g. with HTTP error code). The same memory mapped region is used to pass the HTTP response to the kernel.
Finally, softirq handler can send the response to the client.

GFSM should be used to redirect HTTP messages satisfying user_space_offload rule to user space and wait for responses (e.g. modified HTTP requests for further redirection in ICAP case or just a response for RESTful API case).

A user-space logic may produce larger HTTP message than an original, e.g. add an HTTP header. We can do this with allocation a page fragment (also in user space) and pass it to kernel with the frament offset to let the kernel properly arrange skb fragments.

Since a user-space application may run in a virtual container, the mapping transport must be container-aware and provide a configuration which HTTP messages map to which containers, probably based on current vhost and location basis.

API

A C API must be provided to bind with various programming languages like C, C++, Rust or Python.

Probably io_uring should be used for the API, also see the generic ring buffer API proposal for the Linux kernel.

Asynchronous processing

Having event-driven software, e.g. Nginx, a modern HTTP servers can process thousands requests concurrently on modern multi-core machines. However, there are still heavy computational tasks, leading to high response times on large percentilies, e.g. data compression or some security checks, e.g. parsing and analyzing a DOM tree for a large HTTP response. These tasks are performed on CPU and can not be offloaded to a co-processor leaving CPU processing other HTTP messages. While some tasks can be offloaded to GPU, e.g. TLS handshakes, some tasks work with large memory volumes in stream mode, e.g. HTTP POST processing, so it doesn't make sense to offload them to GPU. Thus, if a server has N CPU cores and gets N HTTP request with expensive CPU computations, it can not process other light-weight requests.

This task, offloading some HTTP processing to user-space, solves the problem with synchronous processing: now we can offload expensive CPU computations to a user-space where they'll be processed with lower scheduler priority while softirq can continue to work with other HTTP requests. GFSM is useful here to store an HTTP message processing context for user-space processing.

Synchronous processing

Some logic (security applications) require to make a decision (pass/block) or mangle a traffic synchronously, to not to pass malicious traffic to a protected backend server. This processing type can be done in the same user-space process as the asynchronous one, probably using the GFSM or some synchronization mechanism in a shared memory.

Dynamic programs

The API must allow to register (attach) new synchronous and asynchronous user-space programs in run-time, without Tempesta FW restart (just like BPF scripts).

Serverless

If we map all the pages with HTTP messages as read-only for the user space and use separate memory area for writing, then this can be an alternative for the modern serverless architecture - an unpriviledged user may read their traffic and run some logic in a separate address space.

Failovering

A user space HTTP message handling program can work as a Linux process, Docker or LXC container. If the program crash in a container, then the container infrastructure is responsible to restart the process. However, for the case of Linux process Tempesta FW must take care for restarting the process.

This behaviour is inspired by Erlang OTP and will make C/C++ web applications more reliable: in worst case a user will have CGI-like application which spawns a new process for each request, but in normal case we'll have a true application server without neither the risk for the whole server crash nor extra cost on FastCGI.

References

An example of a similar solution for the Linux zero-copy read via io_uring is in Fast ZC Rx Data Plane using io uring talk.

The text was updated successfully, but these errors were encountered:

low level networking layers. GFSM was designed to build graphs of network protocols FSMs (this design was inspired by FreeBSD netgraph). However, during the years neither we nor external users have any requirements to introduce any modules which use GFSM to hook TLS or HTTP entry code. There are only 2 users of the mechanism for TLS and HTTP for now: 1. TLS -> HTTP protocols handling 2. HTTP limits (the frang module) This patch replaces GFSM calls with direct calls to tfw_http_req_process(), tfw_tls_msg_process() and frang_tls_handler() in following paths: 1. sync sockets -> TLS 2. sync sockets -> HTTP 3. TLS -> HTTP 4. TLS -> Frang As the result the function tfw_connection_recv() was eliminated. Now the code is simpler and has lower overhead. We still might need GFSM for the user-space requests handling (#77) and Tempesta Language (#102).

Almost literaly follow ak patch from 2eae1da Replace GFSM calls with direct calls to TLS and HTTP handlers on low level networking layers. GFSM was designed to build graphs of network protocols FSMs (this design was inspired by FreeBSD netgraph). However, during the years neither we nor external users have any requirements to introduce any modules which use GFSM to hook TLS or HTTP entry code. There are only 2 users of the mechanism for TLS and HTTP for now: 1. TLS -> HTTP protocols handling 2. HTTP limits (the frang module) This patch replaces GFSM calls with direct calls to tfw_http_req_process(), tfw_tls_msg_process() and frang_tls_handler() in following paths: 1. sync sockets -> TLS 2. sync sockets -> HTTP 3. TLS -> HTTP 4. TLS -> Frang As the result the function tfw_connection_recv() was eliminated. Now the code is simpler and has lower overhead. We still might need GFSM for the user-space requests handling (tempesta-tech#77) and Tempesta Language (tempesta-tech#102). Contributes to tempesta-tech#755 Based-on-patch-by: Alexander K <[email protected]> Signed-off-by: Aleksey Mikhaylov <[email protected]>

Almost literaly follow ak patch from 2eae1da Replace GFSM calls with direct calls to TLS and HTTP handlers on low level networking layers. GFSM was designed to build graphs of network protocols FSMs (this design was inspired by FreeBSD netgraph). However, during the years neither we nor external users have any requirements to introduce any modules which use GFSM to hook TLS or HTTP entry code. There are only 2 users of the mechanism for TLS and HTTP for now: 1. TLS -> HTTP protocols handling 2. HTTP limits (the frang module) This patch replaces GFSM calls with direct calls to tfw_http_req_process(), tfw_tls_msg_process() and frang_tls_handler() in following paths: 1. sync sockets -> TLS 2. sync sockets -> HTTP 3. TLS -> HTTP 4. TLS -> Frang As the result the function tfw_connection_recv() was eliminated. Now the code is simpler and has lower overhead. We still might need GFSM for the user-space requests handling (#77) and Tempesta Language (#102). Contributes to #755 Based-on-patch-by: Alexander K <[email protected]> Signed-off-by: Aleksey Mikhaylov <[email protected]>

ai-tmpst · 2024-10-04T13:49:07Z

In the scope of #537 is developing a ring-buffer mapped to userspace.
It could be useful in this task. Look at fw/ringbuffer.*.

krizhanovsky added the enhancement label Mar 12, 2015

krizhanovsky assigned vdmit11 Mar 12, 2015

krizhanovsky added this to the 0.5.0 SSL, Stable milestone Mar 12, 2015

krizhanovsky mentioned this issue Mar 12, 2015

cfg: hot configuration reloading #51

Closed

krizhanovsky mentioned this issue Mar 24, 2015

Synchronous Socket: TCP connect #83

Closed

krizhanovsky assigned krizhanovsky and unassigned vdmit11 May 3, 2015

krizhanovsky modified the milestones: TBD, 0.5.0 SSL & TDB Jun 19, 2015

krizhanovsky changed the title ~~User-space/third-party TCP communication interface~~ Kernel-User Space Transport Feb 11, 2017

This was referenced Feb 11, 2017

Full/per-vhost dynamic (re)configuration: gRPC API #67

Open

Content compression & decompression #636

Open

This was referenced Feb 26, 2017

HTTP QoS for asymmetric DDoS mitigation #488

Open

Tempesta automatic reboot and crash dump #251

Closed

krizhanovsky added the crucial label Mar 17, 2017

krizhanovsky mentioned this issue Jan 14, 2018

HTTPtables migration to eBPF #102

Open

krizhanovsky modified the milestones: backlog, 0.10 Kernel-User Space Transport Jan 14, 2018

krizhanovsky mentioned this issue Nov 27, 2018

Protection against malicious file uploads #1119

Open

krizhanovsky mentioned this issue Aug 22, 2021

Web-content management tool #528

Open

krizhanovsky modified the milestones: 1.4 TBD (Kernel-User Space Transport), 1.2 TBD Jan 3, 2022

krizhanovsky removed this from the 1.xx TBD milestone Mar 27, 2023

krizhanovsky added this to the 1.0 - GA milestone Mar 27, 2023

krizhanovsky removed their assignment Mar 27, 2023

krizhanovsky modified the milestones: 1.0 - GA, 1.2 - TBD Nov 12, 2023

krizhanovsky self-assigned this Nov 13, 2023

krizhanovsky mentioned this issue Jun 23, 2024

Edge Side Application Callbacks #2148

Open

krizhanovsky modified the milestones: 1.2 - TBD, 1.0 - GA Sep 27, 2024

krizhanovsky removed their assignment Sep 27, 2024

krizhanovsky mentioned this issue Oct 14, 2024

Add ring buffers mapped to userspace #2259

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel-User Space Transport #77

Kernel-User Space Transport #77

krizhanovsky commented Mar 12, 2015 •

edited

Loading

ai-tmpst commented Oct 4, 2024

Kernel-User Space Transport #77

Kernel-User Space Transport #77

Comments

krizhanovsky commented Mar 12, 2015 • edited Loading

Motivation and architecture

API

Asynchronous processing

Synchronous processing

Dynamic programs

Serverless

Failovering

References

ai-tmpst commented Oct 4, 2024

krizhanovsky commented Mar 12, 2015 •

edited

Loading