Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatibility with Array{Bool} #59

Closed
RenatoGeh opened this issue Mar 2, 2021 · 3 comments
Closed

Incompatibility with Array{Bool} #59

RenatoGeh opened this issue Mar 2, 2021 · 3 comments

Comments

@RenatoGeh
Copy link
Contributor

Hi,

Juice only accepts DataFrames constructed from BitArrays. As far as I know, this is not documented anywhere. Here's a minimal reproducible example:

using ProbabilisticCircuits, DataFrames
D = DataFrame(convert(BitArray, rand(Bool, 100, 4)))
pc = learn_circuit(D; maxiter = 2) # Learns fine with a BitArray
T = DataFrame(rand(Bool, 20, 4))
learn_circuit(T; maxiter = 2) # Crashes
EVI(pc, T) # This also is incompatible with Juice

This is because of how Juice optimizes code. Here's the backtrace:

ERROR: MethodError: no method matching num_chunks(::Array{Bool,1})
Closest candidates are:
  num_chunks(::DataFrame) at /home/renatogeh/.julia/packages/LogicCircuits/OwAjF/src/Utils/data.jl:99
  num_chunks(::Union{BitArray{1}, LogicCircuits.Utils.CuBitVector}) at /home/renatogeh/.julia/packages/LogicCircuits/OwAjF/src/Utils/data.jl:100
Stacktrace:
 [1] num_chunks(::DataFrame) at /home/renatogeh/.julia/packages/LogicCircuits/OwAjF/src/Utils/data.jl:99
 [2] init_satisfies(::DataFrame, ::Nothing, ::Int64) at /home/renatogeh/.julia/packages/LogicCircuits/OwAjF/src/satisfies_flow.jl:57
 [3] satisfies_all(::LogicCircuits.BitCircuit{Array{Int32,1},Array{Int32,2}}, ::DataFrame, ::Nothing) at /home/renatogeh/.julia/packages/LogicCircuits/OwAjF/src/satisfies_flow.jl:48
 [4] satisfies_flows(::LogicCircuits.BitCircuit{Array{Int32,1},Array{Int32,2}}, ::DataFrame, ::Nothing, ::Nothing; on_node::Function, on_edge::Function, weights::Nothing) at /home/renatogeh/.julia/packages/LogicCircuits/OwAjF/src/satisfies_flow.jl:209
 [5] (::ProbabilisticCircuits.var"#139#153"{LogicCircuits.BitCircuit{Array{Int32,1},Array{Int32,2}}})(::DataFrame) at /home/renatogeh/.julia/packages/ProbabilisticCircuits/I8vDU/src/parameters.jl:217
 [6] iterate at ./generator.jl:47 [inlined]
 [7] _collect(::Array{DataFrame,1}, ::Base.Generator{Array{DataFrame,1},ProbabilisticCircuits.var"#139#153"{LogicCircuits.BitCircuit{Array{Int32,1},Array{Int32,2}}}}, ::Base.EltypeUnknown, ::Base.HasShape{1}) at ./array.jl:699
 [8] collect_similar at ./array.jl:628 [inlined]
 [9] map at ./abstractarray.jl:2162 [inlined]
 [10] estimate_parameters_cpu(::LogicCircuits.BitCircuit{Array{Int32,1},Array{Int32,2}}, ::DataFrame, ::Float64; weights::Nothing, entropy_reg::Float64) at /home/renatogeh/.julia/packages/ProbabilisticCircuits/I8vDU/src/parameters.jl:216
 [11] estimate_single_circuit_parameters(::StructSumNode, ::DataFrame; pseudocount::Float64, use_sample_weights::Bool, use_gpu::Bool, component_idx::Int64, entropy_reg::Float64) at /home/renatogeh/.julia/packages/ProbabilisticCircuits/I8vDU/src/parameters.jl:55
 [12] #estimate_parameters#117 at /home/renatogeh/.julia/packages/ProbabilisticCircuits/I8vDU/src/parameters.jl:19 [inlined]
 [13] learn_chow_liu_tree_circuit(::DataFrame; pseudocount::Float64, algo_kwargs::NamedTuple{(:α, :clt_root),Tuple{Float64,String}}, vtree_kwargs::NamedTuple{(:vtree_mode,),Tuple{String}}) at /home/renatogeh/.julia/packages/ProbabilisticCircuits/I8vDU/src/structurelearner/init.jl:14
 [14] learn_chow_liu_tree_circuit at /home/renatogeh/.julia/packages/ProbabilisticCircuits/I8vDU/src/structurelearner/init.jl:10 [inlined]
 [15] learn_circuit(::DataFrame; pick_edge::String, pick_var::String, depth::Int64, pseudocount::Float64, entropy_reg::Float64, sanity_check::Bool, maxiter::Int64, seed::Nothing, return_vtree::Bool) at /home/renatogeh/.julia/packages/ProbabilisticCircuits/I8vDU/src/structurelearner/learner.jl:18
 [16] top-level scope at REPL[6]:1
 [17] run_repl(::REPL.AbstractREPL, ::Any) at /build/julia/src/julia-1.5.3/usr/share/julia/stdlib/v1.5/REPL/src/REPL.jl:288

Since Array{Bool} does not deal with chunks, Juice crashes. Instead, Juice should either accept Array{Bool} or at least convert the offending data into a BitArray. If we implement #58, we might also want to extend data to Array{Int} as well.

@khosravipasha
Copy link
Contributor

The main issue with Array{Bool} is they are much slower in our use cases of computing flows, etc (about 4x or sometimes more). I guess one solution is to do the conversion behind the scene for the users.

@guyvdbroeck
Copy link
Member

Currently these are the assumptions about the data representation:

1/ Complete data

  • Bool data:
    • DataFrame of (Cu)BitVector
  • Float data:
    • DataFrame of (Cu)Vector{Float}

2/ Incomplete data:

  • Bool data:
    • DataFrame of Vector{Union{Bool,Missing}}
    • DataFrame of CuVector{Int8} where typemax represents missing
  • Float data:
    • DataFrame of Vector{Union{Float,Missing}}
    • DataFrame of CuVector{Float} where typmax represents missing

It would make sense to take other data formats and convert them to these standards automatically, so that we don't need to support too many cases.

@khosravipasha
Copy link
Contributor

In v0.4 will be using Matrix{Union{....}} instead of dataframes, will update the docs when that is out.
Then in that case Matrix{Bool} works for queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants