Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] CSV reader: support for multi-character / whitespace delimiter? #26411

Open
asfimport opened this issue Oct 30, 2020 · 3 comments
Open

Comments

@asfimport
Copy link
Collaborator

I don't know how useful general "multi-character" delimiter support is, but one specific type of it that seems useful is "whitespace delimited", meaning any whitespace (possibly multiple / different whitespace characters).

In pandas you can achieve this either by passing delimiter="\s+" or specifying delim_whitespace=True (and both are equivalent, pandas special cases delimiter="\s+" as any other multi-character delimiter is interpreted as an actual regex and triggers the slower python engine intead of using the default c engine)

cc @pitrou @nealrichardson

Reporter: Joris Van den Bossche / @jorisvandenbossche

Note: This issue was originally created as ARROW-10432. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
Is that actually useful in the real world?

@asfimport
Copy link
Collaborator Author

Wes McKinney / @wesm:
Yes, it is. 

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:
The "general multi-character" support I don't really know, but the specific case of "whitespace delimiter" certainly is. For example files that uses multiple spaces to have some alignment of columns in the plain text is not uncommon I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant