You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It turns out that Excel insists on putting a Byte Order Marker first in the CSV files it generates. The BOM is generally invisible, so one has to use a tool like od to check if a file has one. If the BOM is not removed, the File widget will assume it is part of the name of the first string in the file, typically a column header. This is mostly not a problem, until you try to write Python scripts. You will then find that it is impossible to use a column named \ufeffcolumnname, see the last entry in this list: dir(in_data[:, -1].columns) ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '\ufeffcountry']
This results in: in_data[:, -1].columns.country Traceback (most recent call last): File "<console>", line 1, in <module> AttributeError: 'Columns' object has no attribute 'country'
And: in_data[:, -1].columns.\ufeffcountry File "<console>", line 1 in_data[:, -1].columns.\ufeffcountry ^ SyntaxError: unexpected character after line continuation character
I admit this is probably not a huge problem for most and is most easily solved by just nuking the BOM in the file before reading it, but in case anyone else runs into this, it’s perhaps good to have it documented.
janezd
added
bug
A bug confirmed by the core team
snack
This will take an hour or two
and removed
bug report
Bug is reported by user, not yet confirmed by the core team
labels
Jan 24, 2025
What's wrong?
It turns out that Excel insists on putting a Byte Order Marker first in the CSV files it generates. The BOM is generally invisible, so one has to use a tool like od to check if a file has one. If the BOM is not removed, the File widget will assume it is part of the name of the first string in the file, typically a column header. This is mostly not a problem, until you try to write Python scripts. You will then find that it is impossible to use a column named \ufeffcolumnname, see the last entry in this list:
dir(in_data[:, -1].columns) ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '\ufeffcountry']
This results in:
in_data[:, -1].columns.country Traceback (most recent call last): File "<console>", line 1, in <module> AttributeError: 'Columns' object has no attribute 'country'
And:
in_data[:, -1].columns.\ufeffcountry File "<console>", line 1 in_data[:, -1].columns.\ufeffcountry ^ SyntaxError: unexpected character after line continuation character
I admit this is probably not a huge problem for most and is most easily solved by just nuking the BOM in the file before reading it, but in case anyone else runs into this, it’s perhaps good to have it documented.
How can we reproduce the problem?
I attach a sample file that you can play with.
What's your environment?
mincpcap_cppp.csv
The text was updated successfully, but these errors were encountered: