-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better handling of big json files #21
Comments
That sounds like a reasonable guess. I'll do some profiling and see what crops up. Would you be able to share the source of your big JSON file so I can get a reasonable comparison? |
Can't share the one I originally ran but here's a python script I made to create a file
That makes a 31MB file that did even worse:
|
Great, thanks! |
@srwilson I'm not done yet, but I've made some changes in 2e2114b that should help you a bit. There's a couple of really minor speedups here and there, but the two main things are:
Using a JSON file generated from your python script:
The I've had a think about using a streaming JSON parser, but you could only use it when There's still more to be done to make things better so I'm not going to close this issue right now. In the meantime I've tagged and released what I've done so far as 0.3.4 Thanks again! |
@srwilson nothing's tagged yet, but I thought you might be interested to know I've made some pretty big changes to gron's inner workings to make the sort more efficient (ec6e312). The outcome is that worst-case performance (colors and sorting enabled) is now around 5 times better. The slightly unfortunate thing is that the best-case performance (monochrome, no sorting) is slightly worse - mostly because of an increased number of allocations. Thankfully the massive refactor opens up new avenues for meaningful optimisation now that the sorting doesn't dominate quite so much. Here's the same tests from above repeated with a build from
|
A few commits later and I've made some more improvements. Removed some unnecessary copies and made the monochrome mode forced by the output not being a terminal:
That puts worst case when stdout is redirected at about 9 times better, and best case (i.e. with I'm going to consider the issue 'fixed', although I will continue to make things faster. I've released all the changes as 0.3.6. @srwilson thanks again for your input! |
Currently running gron on large json files is very slow. For example a 40MB file takes over a minute:
My guess is it's in the sorting phase. Would it possible to avoid sorting all together? Maybe doing a streaming decode of the json would be helpful too.
At the very least it should be possible to disable sorting via command line option.
The text was updated successfully, but these errors were encountered: