Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from CSV to NimbleCSV #3

Open
zachdaniel opened this issue Dec 19, 2020 · 5 comments
Open

Switch from CSV to NimbleCSV #3

zachdaniel opened this issue Dec 19, 2020 · 5 comments
Labels
bug Something isn't working enhancement New feature or request good first issue Good for newcomers

Comments

@zachdaniel
Copy link
Contributor

The CSV library under the hood uses parallel stream, which for some reason (I suspect this is the source) is emitting a bunch of exit messages. NimbleOptions will perform better anyway I suspect, as we can build the parsers at compile time.

@zachdaniel zachdaniel added bug Something isn't working enhancement New feature or request good first issue Good for newcomers labels Dec 19, 2020
@kevinam99
Copy link

kevinam99 commented Mar 28, 2023

Is anyone working on this at the moment? Haven't used NimbleCSV ever but I suppose this would be a good way to start.

@zachdaniel
Copy link
Contributor Author

Pretty sure no one is, would you like me to assign it to you?

@kevinam99
Copy link

Sure. I'd like to have a go at it. Could you give me a brief overview on how this is intended to work?

@zachdaniel
Copy link
Contributor Author

Honestly, its been a long time since I looked at the code here. Ultimately changing it to use NimbleCSV likely involves something like:

  1. a Spark.Dsl.Transformer that creates the compiler for a resource at compile time (i.e does something like this NimbleCSV.define(MyParser, separator: "\t", escape: "\"") (but embedding the resource fields in some way)
  2. swap out all of the csv operations to use NimbleCSV via the new parser

@kevinam99
Copy link

Okay, cool. Let me take some time to go through it and then maybe I'll create a draft PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants