-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSVSTAT Memory limits? #581
Comments
Same problem here... Tried to csvstat on a ~5gb file on my 4gb machine as i read csvkit computes "line-by-line"? |
Not all utilities are line-by-line - calculating statistics requires holding much more of the file in memory. |
Cool thanks for the quick response 👍 found a way to circumvent it |
What was your solution? |
To be honest I just switched to using vanilla Python and read the csv line-by-line there. Looping through 2m lines and 9gb took a few seconds for simple stats |
Yeah, a generic tool like |
Closing. |
Tried to run csvstat on a 1.9gb file about 7m rows x 74 columns (mixed and sparse) after a long time just got "killed". I'm on a 8GB machine with Linux 14.04 & Python 3+ Is there a way to approximate the limits that can be used? Or can I get a more informative error?
The text was updated successfully, but these errors were encountered: