CSV files containing BOM get that invisibly tacked onto first column name #7000

kaimikael · 2025-01-21T16:11:04Z

What's wrong?

It turns out that Excel insists on putting a Byte Order Marker first in the CSV files it generates. The BOM is generally invisible, so one has to use a tool like od to check if a file has one. If the BOM is not removed, the File widget will assume it is part of the name of the first string in the file, typically a column header. This is mostly not a problem, until you try to write Python scripts. You will then find that it is impossible to use a column named \ufeffcolumnname, see the last entry in this list:
dir(in_data[:, -1].columns) ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '\ufeffcountry']

This results in:
in_data[:, -1].columns.country Traceback (most recent call last): File "<console>", line 1, in <module> AttributeError: 'Columns' object has no attribute 'country'

And:
in_data[:, -1].columns.\ufeffcountry File "<console>", line 1 in_data[:, -1].columns.\ufeffcountry ^ SyntaxError: unexpected character after line continuation character

I admit this is probably not a huge problem for most and is most easily solved by just nuking the BOM in the file before reading it, but in case anyone else runs into this, it’s perhaps good to have it documented.

How can we reproduce the problem?

I attach a sample file that you can play with.

What's your environment?

Operating system: macOS 15.2
Orange version: 3.38.1
How you installed Orange: From the installer at https://download.biolab.si/download/files/Orange3-3.38.1-Python3.10.11-x86_64.dmg

mincpcap_cppp.csv

The text was updated successfully, but these errors were encountered:

kaimikael added the bug report Bug is reported by user, not yet confirmed by the core team label Jan 21, 2025

janezd added bug A bug confirmed by the core team snack This will take an hour or two and removed bug report Bug is reported by user, not yet confirmed by the core team labels Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSV files containing BOM get that invisibly tacked onto first column name #7000

CSV files containing BOM get that invisibly tacked onto first column name #7000

kaimikael commented Jan 21, 2025

CSV files containing BOM get that invisibly tacked onto first column name #7000

CSV files containing BOM get that invisibly tacked onto first column name #7000

Comments

kaimikael commented Jan 21, 2025