Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parser error for Byte order mark (BOM) #57

Closed
bwbroersma opened this issue Jul 17, 2023 · 4 comments
Closed

Improve parser error for Byte order mark (BOM) #57

bwbroersma opened this issue Jul 17, 2023 · 4 comments

Comments

@bwbroersma
Copy link
Contributor

A security.txt file with a Byte order mark (BOM) is currently done correctly, but confusing for the user.
E.g. an unsigned sectxt will result in:

[
  {
    'code': 'invalid_line',
    'message': 'Line must contain a field name and value, unless the line is blank or contains a comment.',
    'line': 1
  }
]

and a signed message with BOM will result in the same result as #41, since it's checked with:

sectxt/sectxt/__init__.py

Lines 139 to 146 in 79bb386

if line == "-----BEGIN PGP SIGNED MESSAGE-----":
if self._line_no != 1:
self._add_error(
"signed_format_issue",
"Signed security.txt must start with the header "
"'-----BEGIN PGP SIGNED MESSAGE-----'.",
)
self._signed = True

so _signed won't be set to True, because the BOM prefix, which will result in #41 behavior that it errors about every armored line.

Because a BOM is not visible in a text editor, and most of the time it's added without the user explicitly requesting this, it would help hinting to this with an improved error.

@DigitalTrustCenter
Copy link
Owner

This has been resolved with the new release. The issue with the BOM could occur, but it has no meaning in utf-8. If it occurs it is removed and the text is parsed without it.

@bwbroersma
Copy link
Contributor Author

bwbroersma commented Aug 25, 2023

I'm not sure about is the stripping of the BOM is in line with the RFC 9116 - File Format Description and ABNF Grammar:

The file format of the "security.txt" file MUST be plain text (MIME type "text/plain") as defined in Section 4.1.3 of [RFC2046] and MUST be encoded using UTF-8 [RFC3629] in Net-Unicode form [RFC5198].

RFC 5198 states:

  1. Net-Unicode Definition
    The Network Unicode format (Net-Unicode) is defined as follows. Parts of this definition are deliberately informal, providing guidance for specific profiles or rules in the protocols that reference this one rather than firm rules that apply globally.

    5. As suggested in Section 6 of RFC 3629, the Byte Order Mark ("BOM") signature MUST NOT appear at the beginning of these text strings.

So I think the BOM should be stripped and parsing should continue, but it should also trigger an error or at least warning. Especially in combination with signing it's not nice to have a BOM, although it's outside of the PGP block, a file with BOM is no longer recognized with file in Linux as a PGP signed file.

@DigitalTrustCenter: therefor my request to reopen this issue.

@DigitalTrustCenter
Copy link
Owner

Agreed that the underlying issue should be highlighted when the file has the BOM present. Will reopen the issue.

@DigitalTrustCenter
Copy link
Owner

With the new release an error message has been added. If the byte order mark is present in the file it will continue to process the file without the BOM, but it will add an error to highlight that the file has the BOM present. This will mean that the security.txt is not valid if it has a BOM in the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants