Improve parser error for Byte order mark (BOM) #57

bwbroersma · 2023-07-17T16:54:05Z

A security.txt file with a Byte order mark (BOM) is currently done correctly, but confusing for the user.
E.g. an unsigned sectxt will result in:

[
  {
    'code': 'invalid_line',
    'message': 'Line must contain a field name and value, unless the line is blank or contains a comment.',
    'line': 1
  }
]

and a signed message with BOM will result in the same result as #41, since it's checked with:

sectxt/sectxt/__init__.py

Lines 139 to 146 in 79bb386

    
           if line == "-----BEGIN PGP SIGNED MESSAGE-----": 
        
               if self._line_no != 1: 
        
                   self._add_error( 
        
                       "signed_format_issue", 
        
                       "Signed security.txt must start with the header " 
        
                       "'-----BEGIN PGP SIGNED MESSAGE-----'.", 
        
                   ) 
        
               self._signed = True

so _signed won't be set to True, because the BOM prefix, which will result in #41 behavior that it errors about every armored line.

Because a BOM is not visible in a text editor, and most of the time it's added without the user explicitly requesting this, it would help hinting to this with an improved error.

The text was updated successfully, but these errors were encountered:

DigitalTrustCenter · 2023-08-03T09:11:38Z

This has been resolved with the new release. The issue with the BOM could occur, but it has no meaning in utf-8. If it occurs it is removed and the text is parsed without it.

bwbroersma · 2023-08-25T19:09:44Z

I'm not sure about is the stripping of the BOM is in line with the RFC 9116 - File Format Description and ABNF Grammar:

The file format of the "security.txt" file MUST be plain text (MIME type "text/plain") as defined in Section 4.1.3 of [RFC2046] and MUST be encoded using UTF-8 [RFC3629] in Net-Unicode form [RFC5198].

RFC 5198 states:

Net-Unicode Definition
The Network Unicode format (Net-Unicode) is defined as follows. Parts of this definition are deliberately informal, providing guidance for specific profiles or rules in the protocols that reference this one rather than firm rules that apply globally.
…
5. As suggested in Section 6 of RFC 3629, the Byte Order Mark ("BOM") signature MUST NOT appear at the beginning of these text strings.

So I think the BOM should be stripped and parsing should continue, but it should also trigger an error or at least warning. Especially in combination with signing it's not nice to have a BOM, although it's outside of the PGP block, a file with BOM is no longer recognized with file in Linux as a PGP signed file.

@DigitalTrustCenter: therefor my request to reopen this issue.

DigitalTrustCenter · 2023-09-04T11:57:05Z

Agreed that the underlying issue should be highlighted when the file has the BOM present. Will reopen the issue.

DigitalTrustCenter · 2023-11-07T13:08:58Z

With the new release an error message has been added. If the byte order mark is present in the file it will continue to process the file without the BOM, but it will add an error to highlight that the file has the BOM present. This will mean that the security.txt is not valid if it has a BOM in the file.

SanderKools-Ordina mentioned this issue Aug 2, 2023

PGP format validation #59

Merged

DigitalTrustCenter closed this as completed Aug 3, 2023

bwbroersma mentioned this issue Aug 16, 2023

Update sectxt to 0.9.0 internetstandards/Internet.nl#1046

Open

DigitalTrustCenter reopened this Sep 4, 2023

This was referenced Nov 7, 2023

Add error message byte order mark #62

Closed

Add error message byte order mark #63

Merged

DigitalTrustCenter closed this as completed Nov 7, 2023

bwbroersma mentioned this issue Apr 5, 2024

Request to change Parser from utf-8 to bytes #69

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve parser error for Byte order mark (BOM) #57

Improve parser error for Byte order mark (BOM) #57

bwbroersma commented Jul 17, 2023

DigitalTrustCenter commented Aug 3, 2023

bwbroersma commented Aug 25, 2023 •

edited

Loading

DigitalTrustCenter commented Sep 4, 2023

DigitalTrustCenter commented Nov 7, 2023

Improve parser error for Byte order mark (BOM) #57

Improve parser error for Byte order mark (BOM) #57

Comments

bwbroersma commented Jul 17, 2023

DigitalTrustCenter commented Aug 3, 2023

bwbroersma commented Aug 25, 2023 • edited Loading

DigitalTrustCenter commented Sep 4, 2023

DigitalTrustCenter commented Nov 7, 2023

bwbroersma commented Aug 25, 2023 •

edited

Loading