Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] csv does not work with a cedilla delimiter #39287

Closed
rkarthik29 opened this issue Dec 18, 2023 · 4 comments
Closed

[Python] csv does not work with a cedilla delimiter #39287

rkarthik29 opened this issue Dec 18, 2023 · 4 comments

Comments

@rkarthik29
Copy link

Describe the bug, including details regarding any error messages, version, and platform.

we are trying to parse a CSV file that has a cedilla \x03 delimiter. When passing this to pyarrow.csv.read_csv, the parse complains that the delimiter \x03 is not a valid ascii. Is there a way around this issue ?

ile ~/miniconda3/envs/sandbox/lib/python3.11/site-packages/pyarrow/_csv.pyx:422, in pyarrow._csv.ParseOptions.init()

File ~/miniconda3/envs/sandbox/lib/python3.11/site-packages/pyarrow/_csv.pyx:445, in pyarrow._csv.ParseOptions.delimiter.set()

File ~/miniconda3/envs/sandbox/lib/python3.11/site-packages/pyarrow/_csv.pyx:46, in pyarrow._csv._single_char()

ValueError: Expecting an ASCII character

i get the above error if i use the character as delim "Ç"

but if i use \x03 , the data is read, but it is not parsed into columns.

i also tried "Ç".encode(ascii) and i get an error that the ascii code \xc7 is not valid. I tried chr(199). Nothing worked. Could anybody guide me in the right direction here. Dont want to use pandas

Component(s)

Python

@AlenkaF
Copy link
Member

AlenkaF commented Dec 19, 2023

I think the issue here is that letter C with cedilla is not included in the main ASCII Table and I am not sure how to work around it.

I am not sure pandas will work here either.

@rkarthik29
Copy link
Author

@AlenkaF It works with pandas if we use the unicode value \x03 for the cedilla as our seperator.

@rkarthik29
Copy link
Author

reader=pd.read_csv(pfs.open("xyz.dat","rb")
,sep="\Ç",header=None,names=names,chunksize=100,encoding="utf-8",quotechar='"')

this works for pandas

@kou kou changed the title pyarrow csv does not work with a cedilla delimiter [Python] csv does not work with a cedilla delimiter Jan 4, 2024
@jorisvandenbossche
Copy link
Member

There is currently no way around this: arrow only supports single ASCII characters as separator at the moment.

Related issues: #32432, #26411

Going to close this as duplicate of those other issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants