Memory Usage and ProcessExitedException() #148

MaximilianJHuber · 2018-03-09T00:17:37Z

I am running JuliaDB 0.7.2 and OnlineStats 0.16.0. I load an earlier saved Distributed Table with 111496701 rows in 112 chunks with maximal chunk size of 220MB:

addprocs()
using JuliaDB
table = load("C:/table_saved")

When I run a reduce on an Int64 field that has maybe 10 different numbers:
reduce(CountMap(Int64), table; select = :FIELD)
it takes some time, all four CPU cors are busy, and Julia allocates and keeps allocated 12GB RAM even after it finished. Is that alright?

Now, the real issue is that when I want to do the same with an Int64 field that has couple of hundred different values but also missing values:
reduce(CountMap(DataValues.DataValue{Int64}), table; select = :FIELD2)
a worker reports:
Worker 2 terminated.ERROR (unhandled task failure): read: connection reset by peer (ECONNRESET)

The text was updated successfully, but these errors were encountered:

shashi · 2018-03-09T05:33:22Z

That's strange. In your data, do you have a lot of strings? Would like to see eltype(table)

MaximilianJHuber · 2018-03-09T15:31:59Z

eltype(table) yields 12 Int64, 3 Date, 16 String, 5 DataValues.DataValue{Int64}. The key is (Int64, Date, Date).
BTW: the function keytype is not implemented for nexttable only for NDSparse, is that a mistake?

MaximilianJHuber · 2018-03-09T15:51:02Z

I also ran a histogram on the Int64 variable plot(reduce(Hist(100), table; select = :FIELD)) with and without calling addprocs() before. Same picture, RAM fills up, and needs to write to disk. This should not happen with OnlineStats, right?

shashi · 2018-03-12T05:13:59Z

Okay, the issue here is we currently can't lazy load string data, yet any operation that needs to look at all the chunks (like reduce) loads all of it although it's operating on a non-string column. We have been fixing this in queryverse/TextParse.jl#48 -- stay tuned, we'll be releasing this soon.

MaximilianJHuber closed this as completed Jun 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Usage and ProcessExitedException() #148

Memory Usage and ProcessExitedException() #148

MaximilianJHuber commented Mar 9, 2018 •

edited

Loading

shashi commented Mar 9, 2018

MaximilianJHuber commented Mar 9, 2018

MaximilianJHuber commented Mar 9, 2018

shashi commented Mar 12, 2018

Memory Usage and ProcessExitedException() #148

Memory Usage and ProcessExitedException() #148

Comments

MaximilianJHuber commented Mar 9, 2018 • edited Loading

shashi commented Mar 9, 2018

MaximilianJHuber commented Mar 9, 2018

MaximilianJHuber commented Mar 9, 2018

shashi commented Mar 12, 2018

MaximilianJHuber commented Mar 9, 2018 •

edited

Loading