You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
def_get_str(self, content: bytes) ->str:
try:
ifcontent.startswith(codecs.BOM_UTF8):
content=content.replace(codecs.BOM_UTF8, b'')
self._add_error(
"bom_in_file",
"The Byte-Order Mark was found in the UTF-8 File. ""Security.txt must be encoded using UTF-8 in Net-Unicode form, ""the BOM signature must not appear at the beginning."
)
returncontent.decode('utf-8')
exceptUnicodeError:
self._add_error("utf8", "Content must be utf-8 encoded.")
returncontent.decode('utf-8', errors="replace")
I did a simple test and by indenting self._add_error(...) to be on the same level as the line before it only gave error when there is a BOM at the beginning of the file.
In addition, the error message doesn't state that the BOM is at the start of the file, which is the problem. I'd suggest to change the first sentence to: "The Byte-Order Mark (BOM) was found at the start of the UTF-8 file." (Not really sure if "UTF-8" is necessary.)
Finally, content.replace(codecs.BOM_UTF8, b'') will replace all BOM occurences, not only the starting BOM. I don't know if this is an oversight, or if it is intended to make the following parsing easier, but conceptually it feels wrong, and the following would fix the issue, and only that, as bytes.replace() has a third count argument:
content=content.replace(codecs.BOM_UTF8, b'', 1)
The text was updated successfully, but these errors were encountered:
Thank you for reporting this issue. This was a serious oversight and your analysis was correct. The error should only be reported if the BOM was found in the file. We also agree with your proposal to improve the error message to include that the bom was at the start of the file and that the replace only does the first occurrence, even though the BOM should only be able to occur at the start it does make it more clear what it is doing.
This issue is resolved with the update to version 0.9.2
The BOM check has an error message, but it is outside the "then" part of the code, making the error message always appear. https://github.com/DigitalTrustCenter/sectxt/blob/main/sectxt/__init__.py#L419C1-L432C57 has:
I did a simple test and by indenting
self._add_error(...)
to be on the same level as the line before it only gave error when there is a BOM at the beginning of the file.In addition, the error message doesn't state that the BOM is at the start of the file, which is the problem. I'd suggest to change the first sentence to: "The Byte-Order Mark (BOM) was found at the start of the UTF-8 file." (Not really sure if "UTF-8" is necessary.)
Finally,
content.replace(codecs.BOM_UTF8, b'')
will replace all BOM occurences, not only the starting BOM. I don't know if this is an oversight, or if it is intended to make the following parsing easier, but conceptually it feels wrong, and the following would fix the issue, and only that, asbytes.replace()
has a thirdcount
argument:The text was updated successfully, but these errors were encountered: