-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large files being split in output #144
Labels
Comments
Any movement on this issue? I got bit by this bug when upgrading from 2.0.1 to 3.0.1. There is also and issue with the limit on the number of active streams. |
Jason3S
added a commit
to streetsidesoftware/cspell
that referenced
this issue
Nov 28, 2023
The project seem dead: smonn/globcat#144
@Jason3S Thanks for this very detailed issue! I'll dive into it and get it sorted out. |
smonn
added a commit
that referenced
this issue
Dec 15, 2023
* fix: Large files being split in output #144 version bumps etc * add coverage dep * 3.1.0-0 * 3.1.0 * fix coverage dep * reverse tag temporarily * fix some async tests * fix pnpm ci usage * 3.1.0-1
Fixed in 3.1.0, let me know if it's still causing troubles. Thanks! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi, this is a really useful library, so thank you!
I've hit an issue when concatenating large files, whereby the file contents get split up and mixed in with the content of other files.
Issue reproduction
I've created a codesandbox to reproduce the issue:
https://codesandbox.io/p/sandbox/clever-heisenberg-fkfxrt
The two input files in
./data
area.txt
(containing thousands of lines of "a") andb.txt
(containing a single "b"). There's abuild
script which runsglobcat
and outputs to./dist/output.txt
.Expected outcome - I expect
output.txt
to contain all the content froma.txt
followed by the content fromb.txt
.Actual outcome - The content from
b.txt
is in the middle if the output, with thousands of "a"s before and after it.Possible cause
I have limited experience with Node streams but I think the issue might lie here: https://github.com/smonn/globcat/blob/main/src/lib/files-to-stream.ts. Specifically this bit (shortened for clarity):
As I understand it, because this happens within a loop, it means that files are being streamed to
passthrough
simultaneously, so their order is not guaranteed. I imagine this is fine as long as each file is below thehighWaterMark
(64KiB by default) and is therefore processed as a single chunk.However, files larger than this will be broken into chunks, and those chunks may be delayed and processed after chunks from a different file, making it possible for file content to be jumbled up as in my reproduction.
Possible fix
I've fixed this locally by writing chunks synchronously. As I say, I'm no expert on Node streams so I don't know whether there are any issues with this solution, but I've included it in case it's helpful:
Thanks again,
Pete
The text was updated successfully, but these errors were encountered: