BOM error message always present #65

affimnem · 2024-02-14T12:00:51Z

The BOM check has an error message, but it is outside the "then" part of the code, making the error message always appear. https://github.com/DigitalTrustCenter/sectxt/blob/main/sectxt/__init__.py#L419C1-L432C57 has:

   def _get_str(self, content: bytes) -> str:
        try:
            if content.startswith(codecs.BOM_UTF8):
                content = content.replace(codecs.BOM_UTF8, b'')
            self._add_error(
                "bom_in_file",
                "The Byte-Order Mark was found in the UTF-8 File. "
                "Security.txt must be encoded using UTF-8 in Net-Unicode form, "
                "the BOM signature must not appear at the beginning."
            )
            return content.decode('utf-8')
        except UnicodeError:
            self._add_error("utf8", "Content must be utf-8 encoded.")
        return content.decode('utf-8', errors="replace")

I did a simple test and by indenting self._add_error(...) to be on the same level as the line before it only gave error when there is a BOM at the beginning of the file.

In addition, the error message doesn't state that the BOM is at the start of the file, which is the problem. I'd suggest to change the first sentence to: "The Byte-Order Mark (BOM) was found at the start of the UTF-8 file." (Not really sure if "UTF-8" is necessary.)

Finally, content.replace(codecs.BOM_UTF8, b'') will replace all BOM occurences, not only the starting BOM. I don't know if this is an oversight, or if it is intended to make the following parsing easier, but conceptually it feels wrong, and the following would fix the issue, and only that, as bytes.replace() has a third count argument:

                content = content.replace(codecs.BOM_UTF8, b'', 1)

The text was updated successfully, but these errors were encountered:

DigitalTrustCenter · 2024-03-07T12:21:07Z

Thank you for reporting this issue. This was a serious oversight and your analysis was correct. The error should only be reported if the BOM was found in the file. We also agree with your proposal to improve the error message to include that the bom was at the start of the file and that the replace only does the first occurrence, even though the BOM should only be able to occur at the start it does make it more clear what it is doing.

This issue is resolved with the update to version 0.9.2

mxsasha mentioned this issue Feb 15, 2024

Update sectxt to 0.9.0 internetstandards/Internet.nl#1046

Open

DigitalTrustCenter mentioned this issue Mar 7, 2024

sectxt update 0.9.2 #68

Merged

DigitalTrustCenter closed this as completed Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BOM error message always present #65

BOM error message always present #65

affimnem commented Feb 14, 2024 •

edited

Loading

DigitalTrustCenter commented Mar 7, 2024

BOM error message always present #65

BOM error message always present #65

Comments

affimnem commented Feb 14, 2024 • edited Loading

DigitalTrustCenter commented Mar 7, 2024

affimnem commented Feb 14, 2024 •

edited

Loading