-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Usage and ProcessExitedException() #148
Comments
That's strange. In your data, do you have a lot of strings? Would like to see |
|
I also ran a histogram on the Int64 variable |
Okay, the issue here is we currently can't lazy load string data, yet any operation that needs to look at all the chunks (like reduce) loads all of it although it's operating on a non-string column. We have been fixing this in queryverse/TextParse.jl#48 -- stay tuned, we'll be releasing this soon. |
I am running
JuliaDB 0.7.2
andOnlineStats 0.16.0
. I load an earlier savedDistributed Table with 111496701 rows in 112 chunks
with maximal chunk size of 220MB:addprocs()
using JuliaDB
table = load("C:/table_saved")
When I run a reduce on an Int64 field that has maybe 10 different numbers:
reduce(CountMap(Int64), table; select = :FIELD)
it takes some time, all four CPU cors are busy, and Julia allocates and keeps allocated 12GB RAM even after it finished. Is that alright?
Now, the real issue is that when I want to do the same with an Int64 field that has couple of hundred different values but also missing values:
reduce(CountMap(DataValues.DataValue{Int64}), table; select = :FIELD2)
a worker reports:
Worker 2 terminated.ERROR (unhandled task failure): read: connection reset by peer (ECONNRESET)
The text was updated successfully, but these errors were encountered: