Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add %byte{value} logformat code for logging or sending arbitrary bytes #236

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

eduard-bagdasaryan
Copy link

@eduard-bagdasaryan eduard-bagdasaryan commented Nov 9, 2023

No support for zero byte values yet because existing Format::assemble()
code does not support that out of the box, and there is no known need
for such support. It can be added later (without backward compatibility
problems) if needed.

No support for zero byte values yet because existing Format::assemble()
code does not support that out of the box, and there is no known _need_
for such support. It can be added later (without backward compatibility
problems) if needed.
src/parser/Tokenizer.h Outdated Show resolved Hide resolved
Also separated this function into Signed and Unsigned ones,
to allow the caller be specific about the input it expects.
eduard-bagdasaryan and others added 18 commits November 14, 2023 22:02
An "internal implementation detail" namespace is often called one of
those three words[^1], abbreviated or not, with an underscore suffix or
not. Let's standardize on "Detail_" because it avoids abbreviations,
does not risk conflicting with "impl" commonly used in Pimpl idiom, and
cannot conflict with any regular Squid namespace (because of the
trailing underscore which, to many, also implies some kind of "private
use").

[^1]: https://stackoverflow.com/a/26546780
... even though the existing callers always supply Integer types that
include zero.
... because the vast majority of other values lack such descriptions,
and there is a significant chance of getting these descriptions wrong
unless we generate this code. FWIW, the mapping is already fairly easy
to find by searching for the ByteCode name.
Unlike LFT_STRING, LFT_BYTE corresponds to a logformat code, so it
should be together with other %codes.
.. because those changes

* mistreated zero port (it should be rejected)
* lost a valuable static_assert checking whether Port type can represent
  the maximum valid port number
* added tok.reset() call that goes against Tokenizer parsing flow

This code should be refactored to use Parser::UnsignedDecimalInteger(),
but that refactoring should wait for the non-CONNECT port parsing to be
upgraded to Tokenizer (and existing TODO) because that upgrade is likely
to influence this refactoring.
... because that code needs to support parsing of octal and hex integers
(in addition to decimal integers). I did not check for other problems.

To convert this code, we need a more powerful/flexible parser.
Branch code used the wrong value type, allowing much larger values
than PROXY protocol specs and, arguably, Squid code allow.

Also documented a UnsignedDecimalInteger() bug that may affect how the
design of branch code moves forward.
This commit effectively reverses branch commit 2f72f5f that attempted to
reduce code duplication in parsing integers. Additional subsequent
attempts were reverted previously. All these attempts were mostly
correct. However, small-but-critical differences in calling code needs
make reuse of a simple integer parser function virtually impossible.

We will reduce this code duplication using a configurable parser. I even
have a sketch we can start from, but that (and all the caller changes)
deserve a dedicated project.
... because we use them in a static_assert(). Let's be explicit about
our genuine needs rather than relying on optional[^1] compiler abilities
to accommodate them.

[^1]: https://stackoverflow.com/a/64501230
@@ -4685,6 +4685,13 @@ DOC_START
Format codes:

% a literal % character

byte{value} Adds a single byte with the given value (e.g., %byte{10}
adds an ASCII LF character a.k.a. "new line" or "\n"). The value
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify: adding %byte{10} results in a genuine new line in access.log (i.e., not a couple of characters "\n"). Is this an intended/expected behavior?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an intended/expected behavior?

Yes, it is. In fact, IIRC, adding new lines is the use case that prompted the addition of this feature. Nearly all other ASCII characters can probably be added (as those characters) without a new logformat %code.

Adding new lines is useful when, for example, access records go to a logging daemon that expects HTTP header-like record syntax.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants