Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(encoding/csv): restructure proposal #2291

Closed
4 of 6 tasks
timreichen opened this issue May 31, 2022 · 10 comments
Closed
4 of 6 tasks

feat(encoding/csv): restructure proposal #2291

timreichen opened this issue May 31, 2022 · 10 comments

Comments

@timreichen
Copy link
Contributor

timreichen commented May 31, 2022

Is your feature request related to a problem? Please describe.
The encoding/csv API is scattered and messy over multiple files and a csv subdir, some files are browser compatible, some are not.

Describe the solution you'd like
This proposal includes a restructure and changes as following:

@kt3k
Copy link
Member

kt3k commented May 31, 2022

I think BufReader input of csv.parse can be deprecated or removed now in favor of stream API.

@crowlKats What do you think?

@crowlKats
Copy link
Member

i'd say deprecating it would be ok, but not removing it; most people havent switched to using the web streams based APIs

@kt3k
Copy link
Member

kt3k commented May 31, 2022

i'd say deprecating it would be ok, but not removing it; most people havent switched to using the web streams based APIs

Sounds reasonable to me. Maybe let's first deprecate BufReader for a while and then start the above sequence of changes?

@crowlKats
Copy link
Member

sounds good

@timreichen
Copy link
Contributor Author

Working on the restructure:

Do we need ColumnOptions for the columns option next to string[]? I don't really see the point, in which cases would this be used?
https://github.com/denoland/deno_std/blob/f415da4c62cdacce880c3c7e492ceee184c3be4a/encoding/csv.ts#L148-L153

@kt3k
Copy link
Member

kt3k commented Jun 6, 2022

ColumnOptions used to have parse property, but that was removed in #1724. Looks like they are now useless. I think it's ok to remove it from the input typing.

@timreichen
Copy link
Contributor Author

timreichen commented Aug 19, 2022

@kt3k Do we really need this column.fn as part of the api? https://github.com/denoland/deno_std/blob/6bd0f514b63c68c57b1625aa38f015d6b2a02676/encoding/csv.ts#L63
One can always loop over the values and manipulate them before the stringification process.
If we remove it, stringify can be sync as parse is now.

@luk3skyw4lker
Copy link
Contributor

luk3skyw4lker commented Sep 2, 2022

I think that your suggestion makes a lot of sense @timreichen

The transformation process always can be done before the stringification process

@ayame113
Copy link
Contributor

ayame113 commented Feb 10, 2023

Below are some additional refactoring suggestions from me.

  • There are currently three optional type definitions for parsers (ReadOptions, ParseOptions, CsvStreamOptions), but I think they can be simplified to one.

    • trimLeadingSpace / lazyQuotes: As far as I read the implementation, it looks like it's already available for CsvStream. It should be added to the type definition of CsvStreamOption.

    • skipFirstRow / columns: Added to CsvStream in #3184.

    • fieldsPerRecord: It doesn't seem to be available for CsvStream yet. Want to add?

      // _io.ts
      interface ReadOptions {
        separator?: string;
        comment?: string;
        trimLeadingSpace?: boolean;
        lazyQuotes?: boolean;
        fieldsPerRecord?: number;
      }
      // csv.ts
      interface ParseOptions extends ReadOptions {
        skipFirstRow?: boolean;
        columns?: string[];
      }
      // stream.ts
      interface CsvStreamOptions {
        separator?: string;
        comment?: string;
      }
  • How about clarifying the file name?

    • _parser.ts seems to be mainly used in csv.ts. Would you like to change it to _sync_parser.ts?
    • _io.ts seems to be mainly used in stream.ts. Would you like to change it to _streaming_parser.ts?
  • StreamLineReader was probably added during the transition from Go-based streams, but is now just a thin wrapper around TextLineStream. Why not remove it and change it to use TextLineStream directly inside?

  • It seems that Parser.prorotype.#parseRecord and parseRecord function have a lot in common. Why not share these implementations?

  • readRecord in _io.ts doesn't seem to be used at all. Should it be removed?

  • csv_test.ts has two identical test cases. I think we can probably organize these better.
    https://github.com/denoland/deno_std/blob/3e9b8943fdd3a0729f1e34e8668eec56f991c81d/encoding/csv_test.ts#L693-L708

  • I think the csv_test.ts and stream_test.ts test cases can be combined into one. How about putting eg testcase1.csv and testcase1_output.json in the testdata directory?

  • How about changing Record<string, unknown>[] to Record<string, string | undefined>[] in return type of parse function?

@iuioiua
Copy link
Contributor

iuioiua commented Dec 8, 2023

I'd like to close this issue as most of the tasks have been completed. @timreichen, does that sound good to you?

I've created #3765 which we can use to guide further work on making std/csv better. @ayame113, would you like to review, refine and copy your suggestions to that issue for further discussion?

@kt3k kt3k closed this as completed Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants