Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV files containing BOM get that invisibly tacked onto first column name #7000

Open
kaimikael opened this issue Jan 21, 2025 · 0 comments
Open
Labels
bug A bug confirmed by the core team snack This will take an hour or two

Comments

@kaimikael
Copy link

What's wrong?

It turns out that Excel insists on putting a Byte Order Marker first in the CSV files it generates. The BOM is generally invisible, so one has to use a tool like od to check if a file has one. If the BOM is not removed, the File widget will assume it is part of the name of the first string in the file, typically a column header. This is mostly not a problem, until you try to write Python scripts. You will then find that it is impossible to use a column named \ufeffcolumnname, see the last entry in this list:
dir(in_data[:, -1].columns) ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '\ufeffcountry']

This results in:
in_data[:, -1].columns.country Traceback (most recent call last): File "<console>", line 1, in <module> AttributeError: 'Columns' object has no attribute 'country'

And:
in_data[:, -1].columns.\ufeffcountry File "<console>", line 1 in_data[:, -1].columns.\ufeffcountry ^ SyntaxError: unexpected character after line continuation character

I admit this is probably not a huge problem for most and is most easily solved by just nuking the BOM in the file before reading it, but in case anyone else runs into this, it’s perhaps good to have it documented.

How can we reproduce the problem?

I attach a sample file that you can play with.

What's your environment?

mincpcap_cppp.csv

@kaimikael kaimikael added the bug report Bug is reported by user, not yet confirmed by the core team label Jan 21, 2025
@janezd janezd added bug A bug confirmed by the core team snack This will take an hour or two and removed bug report Bug is reported by user, not yet confirmed by the core team labels Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug confirmed by the core team snack This will take an hour or two
Projects
None yet
Development

No branches or pull requests

2 participants