Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Usage and ProcessExitedException() #148

Closed
MaximilianJHuber opened this issue Mar 9, 2018 · 4 comments
Closed

Memory Usage and ProcessExitedException() #148

MaximilianJHuber opened this issue Mar 9, 2018 · 4 comments

Comments

@MaximilianJHuber
Copy link

MaximilianJHuber commented Mar 9, 2018

I am running JuliaDB 0.7.2 and OnlineStats 0.16.0. I load an earlier saved Distributed Table with 111496701 rows in 112 chunks with maximal chunk size of 220MB:

addprocs()
using JuliaDB
table = load("C:/table_saved")

When I run a reduce on an Int64 field that has maybe 10 different numbers:
reduce(CountMap(Int64), table; select = :FIELD)
it takes some time, all four CPU cors are busy, and Julia allocates and keeps allocated 12GB RAM even after it finished. Is that alright?

Now, the real issue is that when I want to do the same with an Int64 field that has couple of hundred different values but also missing values:
reduce(CountMap(DataValues.DataValue{Int64}), table; select = :FIELD2)
a worker reports:
Worker 2 terminated.ERROR (unhandled task failure): read: connection reset by peer (ECONNRESET)

@shashi
Copy link
Collaborator

shashi commented Mar 9, 2018

That's strange. In your data, do you have a lot of strings? Would like to see eltype(table)

@MaximilianJHuber
Copy link
Author

eltype(table) yields 12 Int64, 3 Date, 16 String, 5 DataValues.DataValue{Int64}. The key is (Int64, Date, Date).
BTW: the function keytype is not implemented for nexttable only for NDSparse, is that a mistake?

@MaximilianJHuber
Copy link
Author

I also ran a histogram on the Int64 variable plot(reduce(Hist(100), table; select = :FIELD)) with and without calling addprocs() before. Same picture, RAM fills up, and needs to write to disk. This should not happen with OnlineStats, right?

@shashi
Copy link
Collaborator

shashi commented Mar 12, 2018

Okay, the issue here is we currently can't lazy load string data, yet any operation that needs to look at all the chunks (like reduce) loads all of it although it's operating on a non-string column. We have been fixing this in queryverse/TextParse.jl#48 -- stay tuned, we'll be releasing this soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants