Add %byte{value} logformat code for logging or sending arbitrary bytes #236

eduard-bagdasaryan · 2023-11-09T14:57:22Z

No support for zero byte values yet because existing Format::assemble()
code does not support that out of the box, and there is no known need
for such support. It can be added later (without backward compatibility
problems) if needed.

No support for zero byte values yet because existing Format::assemble() code does not support that out of the box, and there is no known _need_ for such support. It can be added later (without backward compatibility problems) if needed.

by reusing a new ParseInteger() template.

src/parser/Tokenizer.h

Also separated this function into Signed and Unsigned ones, to allow the caller be specific about the input it expects.

src/parser/ToInteger.h

An "internal implementation detail" namespace is often called one of those three words[^1], abbreviated or not, with an underscore suffix or not. Let's standardize on "Detail_" because it avoids abbreviations, does not risk conflicting with "impl" commonly used in Pimpl idiom, and cannot conflict with any regular Squid namespace (because of the trailing underscore which, to many, also implies some kind of "private use"). [^1]: https://stackoverflow.com/a/26546780

... even though the existing callers always supply Integer types that include zero.

... because the vast majority of other values lack such descriptions, and there is a significant chance of getting these descriptions wrong unless we generate this code. FWIW, the mapping is already fairly easy to find by searching for the ByteCode name.

Unlike LFT_STRING, LFT_BYTE corresponds to a logformat code, so it should be together with other %codes.

.. because those changes * mistreated zero port (it should be rejected) * lost a valuable static_assert checking whether Port type can represent the maximum valid port number * added tok.reset() call that goes against Tokenizer parsing flow This code should be refactored to use Parser::UnsignedDecimalInteger(), but that refactoring should wait for the non-CONNECT port parsing to be upgraded to Tokenizer (and existing TODO) because that upgrade is likely to influence this refactoring.

... because that code needs to support parsing of octal and hex integers (in addition to decimal integers). I did not check for other problems. To convert this code, we need a more powerful/flexible parser.

Branch code used the wrong value type, allowing much larger values than PROXY protocol specs and, arguably, Squid code allow. Also documented a UnsignedDecimalInteger() bug that may affect how the design of branch code moves forward.

This commit effectively reverses branch commit 2f72f5f that attempted to reduce code duplication in parsing integers. Additional subsequent attempts were reverted previously. All these attempts were mostly correct. However, small-but-critical differences in calling code needs make reuse of a simple integer parser function virtually impossible. We will reduce this code duplication using a configurable parser. I even have a sketch we can start from, but that (and all the caller changes) deserve a dedicated project.

... because we use them in a static_assert(). Let's be explicit about our genuine needs rather than relying on optional[^1] compiler abilities to accommodate them. [^1]: https://stackoverflow.com/a/64501230

... to avoid missing them again. See also: Branch commit 29d34c9.

eduard-bagdasaryan · 2023-11-16T12:24:28Z

src/cf.data.pre

@@ -4685,6 +4685,13 @@ DOC_START
 	Format codes:

 		%	a literal % character
+
+		byte{value}	Adds a single byte with the given value (e.g., %byte{10}
+			adds an ASCII LF character a.k.a. "new line" or "\n"). The value


Just to clarify: adding %byte{10} results in a genuine new line in access.log (i.e., not a couple of characters "\n"). Is this an intended/expected behavior?

Is this an intended/expected behavior?

Yes, it is. In fact, IIRC, adding new lines is the use case that prompted the addition of this feature. Nearly all other ASCII characters can probably be added (as those characters) without a new logformat %code.

Adding new lines is useful when, for example, access records go to a logging daemon that expects HTTP header-like record syntax.

eduard-bagdasaryan added 2 commits November 9, 2023 17:34

Eliminated code duplication in ProxyProtocol::IntegerToFieldType()

2f72f5f

by reusing a new ParseInteger() template.

rousskov requested changes Nov 9, 2023

View reviewed changes

src/parser/Tokenizer.h Outdated Show resolved Hide resolved

Moved ParseInteger() helper into a dedicated header

44ed8a0

Also separated this function into Signed and Unsigned ones, to allow the caller be specific about the input it expects.

eduard-bagdasaryan commented Nov 13, 2023

View reviewed changes

src/parser/ToInteger.h Outdated Show resolved Hide resolved

eduard-bagdasaryan added 4 commits November 13, 2023 17:35

Added an std header

44dd130

Moved implementation details into a Parser::Impl namespace

edf6af0

Eliminated code duplication in AnyP::Uri::parsePort()

675b391

Eliminated code duplication in Ip::NfMarkConfig::Parse()

1607848

eduard-bagdasaryan commented Nov 14, 2023

View reviewed changes

src/parser/ToInteger.h Outdated Show resolved Hide resolved

Removed trailing whitespaces

ed4fde4

eduard-bagdasaryan commented Nov 14, 2023

View reviewed changes

src/parser/ToInteger.h Outdated Show resolved Hide resolved

eduard-bagdasaryan and others added 18 commits November 14, 2023 22:02

Autoformatted

2b55e7d

Polished

2c97d6b

fixup: Use a more common namespace closing comment format

f2481ae

fixup: Polished new names

dbcafa5

fixup: Detail_::DecimalInteger() should not assume 0 is in range

29d34c9

... even though the existing callers always supply Integer types that include zero.

fixup: Use a slightly more logical place for declaring LFT_BYTE

c10e66b

Unlike LFT_STRING, LFT_BYTE corresponds to a logformat code, so it should be together with other %codes.

fixup: Added missing description to the new function

9f218e9

Revert branch Ip::NfMarkConfig changes

7b3b554

... because that code needs to support parsing of octal and hex integers (in addition to decimal integers). I did not check for other problems. To convert this code, we need a more powerful/flexible parser.

Fixed branch code doing PROXY protocol TLV parsing

9dea5e9

Branch code used the wrong value type, allowing much larger values than PROXY protocol specs and, arguably, Squid code allow. Also documented a UnsignedDecimalInteger() bug that may affect how the design of branch code moves forward.

fixup: Tightened parsing to avoid overpromising and underdeliveriing

e890a64

fixup: We _need_ these limits to be constexpr

2f4002d

... because we use them in a static_assert(). Let's be explicit about our genuine needs rather than relying on optional[^1] compiler abilities to accommodate them. [^1]: https://stackoverflow.com/a/64501230

fixup: Slightly better names

799f484

fixup: Added a tricky/important example of an invalid input

af1e7c7

fixup: Formatted modified sources

9ccf1b7

rousskov added 2 commits November 15, 2023 10:24

fixup: Make our simplifying assumptions explicit

5cc2517

... to avoid missing them again. See also: Branch commit 29d34c9.

fixup: Better diagnostic for negative decimals

a985e8d

eduard-bagdasaryan commented Nov 16, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add %byte{value} logformat code for logging or sending arbitrary bytes #236

Add %byte{value} logformat code for logging or sending arbitrary bytes #236

eduard-bagdasaryan commented Nov 9, 2023 •

edited by rousskov

Loading

eduard-bagdasaryan Nov 16, 2023

rousskov Nov 16, 2023

Add %byte{value} logformat code for logging or sending arbitrary bytes #236

Are you sure you want to change the base?

Add %byte{value} logformat code for logging or sending arbitrary bytes #236

Conversation

eduard-bagdasaryan commented Nov 9, 2023 • edited by rousskov Loading

eduard-bagdasaryan Nov 16, 2023

Choose a reason for hiding this comment

rousskov Nov 16, 2023

Choose a reason for hiding this comment

eduard-bagdasaryan commented Nov 9, 2023 •

edited by rousskov

Loading