Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle undefined values in arrays #8

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Handle undefined values in arrays #8

wants to merge 1 commit into from

Conversation

MikeInnes
Copy link
Collaborator

@MikeInnes MikeInnes commented Jun 25, 2018

For now this just writes undefineds as nothing (which works because we don't preserve non-isbits-eltypes anyway). Probably better to use a different sentinel and preserve undefined-ness through a roundtrip.

See #6, cc @DilumAluthge

@DilumAluthge
Copy link
Member

Does this mess up the Array type? I.e. instead of being of type Array{T} it would turn into Array{Union{T, Void}}?

@DilumAluthge
Copy link
Member

DilumAluthge commented Jul 1, 2018

Yeah so right now this doesn't work for saving Dicts.

import BSON
x = Dict("a" => :a)
BSON.@save "test.bson" x

Now quit Julia and re-open it.

import BSON
BSON.@load "test.bson" x

You get the following error:

ERROR: MethodError: Cannot `convert` an object of type Void to an object of type String
This may have arisen from a call to the constructor String(...),
since type constructors fall back to convert methods.
Stacktrace:
 [1] copy!(::IndexLinear, ::Array{String,1}, ::IndexLinear, ::Array{Any,1}) at ./abstractarray.jl:656
 [2] convert(::Type{Array{String,1}}, ::Array{Any,1}) at ./array.jl:368
 [3] newstruct!(::Dict{String,Symbol}, ::Array{UInt8,1}, ::Array{Any,1}, ::Array{Any,1}, ::Int64, ::Int64, ::UInt64, ::Int64, ::Int64) at /Users/dilum/.julia/v0.6/BSON/src/extensions.jl:92
 [4] newstruct_raw(::ObjectIdDict, ::Type{T} where T, ::Dict{Symbol,Any}) at /Users/dilum/.julia/v0.6/BSON/src/extensions.jl:110
 [5] (::BSON.##53#54)(::Dict{Symbol,Any}, ::ObjectIdDict) at /Users/dilum/.julia/v0.6/BSON/src/extensions.jl:125
 [6] raise_recursive(::Dict{Symbol,Any}, ::ObjectIdDict) at /Users/dilum/.julia/v0.6/BSON/src/read.jl:80
 [7] (::BSON.##24#26{ObjectIdDict})(::Dict{Symbol,Any}) at /Users/dilum/.julia/v0.6/BSON/src/read.jl:74
 [8] applychildren!(::BSON.##24#26{ObjectIdDict}, ::Dict{Symbol,Any}) at /Users/dilum/.julia/v0.6/BSON/src/BSON.jl:18
 [9] _raise_recursive(::Dict{Symbol,Any}, ::ObjectIdDict) at /Users/dilum/.julia/v0.6/BSON/src/read.jl:74
 [10] raise_recursive(::Dict{Symbol,Any}, ::ObjectIdDict) at /Users/dilum/.julia/v0.6/BSON/src/read.jl:81
 [11] load(::String) at /Users/dilum/.julia/v0.6/BSON/src/read.jl:96

@DilumAluthge
Copy link
Member

I think that when we parse a doc, we should skip the nothings.

@MikeInnes
Copy link
Collaborator Author

I think we should just use BSON's undefined as a sentinel. It's technically deprecated but I'm not sure we can really expect this case to have good interop with other readers anyway.

@bkamins
Copy link

bkamins commented Sep 22, 2018

Would it also handle missing? The test case is:

df = DataFrame(x=1:3, b = [:a,missing,:c], c = rand(3), d = categorical(["a","b",missing]))

Trying to save it currently crashes Julia.

@DilumAluthge
Copy link
Member

@MikeInnes Should we revisit this? It would be really nice to be able to write arrays with undefined values.

I agree that we should probably just use the BSON undefined as our sentinel.

@DilumAluthge
Copy link
Member

Would it also handle missing? The test case is:

df = DataFrame(x=1:3, b = [:a,missing,:c], c = rand(3), d = categorical(["a","b",missing]))

Trying to save it currently crashes Julia.

@bkamins Saving/loading a DataFrame with missing values works now.

Here is the example code for saving: (using the same test case as in your comment)

using Pkg
Pkg.add("BSON")
Pkg.add("CategoricalArrays")
Pkg.add("DataFrames")
using BSON, CategoricalArrays, DataFrames
df = DataFrame(x=1:3, b = [:a,missing,:c], c = rand(3), d = categorical(["a","b",missing]))
bson("test.bson", Dict(:df => df))

Now quit and restart Julia, and then run the example code for loading:

using BSON, CategoricalArrays, DataFrames
data = BSON.load("test.bson")
data[:df]

And the DataFrame has been loaded correctly, missing values and all.

The output of Pkg.status():

julia> Pkg.status()
    Status `~/.julia/environments/v1.1/Project.toml`
  [fbb218c0] BSON v0.2.3
  [324d7699] CategoricalArrays v0.5.2
  [a93c6f00] DataFrames v0.18.3

And the output of versioninfo(verbose=true):

julia> versioninfo(verbose=true)
Julia Version 1.1.1
Commit 55e36cc (2019-05-16 04:10 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin15.6.0)
  uname: Darwin 18.6.0 Darwin Kernel Version 18.6.0: Thu Apr 25 23:16:27 PDT 2019; root:xnu-4903.261.4~2/RELEASE_X86_64 x86_64 i386
  CPU: Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz:
              speed         user         nice          sys         idle          irq
       #1  2600 MHz      83985 s          0 s      33784 s     256810 s          0 s
       #2  2600 MHz      39330 s          0 s      13825 s     321413 s          0 s
       #3  2600 MHz      95457 s          0 s      29412 s     249699 s          0 s
       #4  2600 MHz      38662 s          0 s      12898 s     323007 s          0 s

  Memory: 8.0 GB (571.66796875 MB free)
  Uptime: 97398.0 sec
  Load Avg:  2.6552734375  2.51123046875  2.33056640625
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
  JULIA_DEBUG = all
  GEM_HOME = /Users/dilum/.rvm/gems/ruby-2.4.1
  TERM = xterm-256color
  MY_RUBY_HOME = /Users/dilum/.rvm/rubies/ruby-2.4.1
  PATH = /Users/dilum/.rvm/gems/ruby-2.4.1/bin:/Users/dilum/.rvm/gems/ruby-2.4.1@global/bin:/Users/dilum/.rvm/rubies/ruby-2.4.1/bin:/Users/dilum/.rvm/bin:/Users/dilum/.cargo/bin:/Applications/Julia-1.1.app/Contents/Resources/julia/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/usr/local/go/bin:/usr/local/MacGPG2/bin:/opt/X11/bin
  XPC_FLAGS = 0x0
  HOME = /Users/dilum
  GEM_PATH = /Users/dilum/.rvm/gems/ruby-2.4.1:/Users/dilum/.rvm/gems/ruby-2.4.1@global

@DilumAluthge
Copy link
Member

@MikeInnes

Also, I couldn't find an issue tracking this bug specifically (undefined values in arrays), so I opened the following issue: #43

@Codyk12
Copy link

Codyk12 commented Jun 11, 2019

In collect_any, Vector{Any}(length(xs)) isn't a valid constructor..., anymore? Im not sure if it used to be or if I am missing something, but I get an error when I run this. I'm not sure there is a way around it.

Screen Shot 2019-06-11 at 12 39 32 PM

Also what happens if the position of the data matters in an array. It seems like
collect_all([#undef, #undef, #undef, "foo"]) -> ["foo"]
Which is bad if I want foo to remain in the 4th spot of the array.
Am I correct in assessing this?

I think all this to say that reworking the splat operator for #undef might be needed and working with the undef sentinel is probably the best way to go.

@DilumAluthge
Copy link
Member

There is a small change to how we call the constructor, but apart from that the collect_any function works fine. Simply replace Vector{Any}(length(xs)) with Vector{Any}(undef, length(xs)).

function collect_any(xs)
    ys = Vector{Any}(undef, length(xs))
    for i = 1:length(xs)
        isassigned(xs, i) && (ys[i] = xs[i])
    end
    return ys
end

And then these examples work exactly as expected:

julia> function collect_any(xs)
           ys = Vector{Any}(undef, length(xs))
           for i = 1:length(xs)
               isassigned(xs, i) && (ys[i] = xs[i])
           end
           return ys
       end
collect_any (generic function with 1 method)

julia> a = [1,2,3,4,5]
5-element Array{Int64,1}:
 1
 2
 3
 4
 5

julia> collect_any(a)
5-element Array{Any,1}:
 1
 2
 3
 4
 5

julia> b = Array{String}(undef, 5)
5-element Array{String,1}:
 #undef
 #undef
 #undef
 #undef
 #undef

julia> b[2] = "hello"
"hello"

julia> b[4] = "world"
"world"

julia> b
5-element Array{String,1}:
 #undef
    "hello"
 #undef
    "world"
 #undef 

julia> collect_any(b)
5-element Array{Any,1}:
 #undef
    "hello"
 #undef
    "world"
 #undef

@DilumAluthge
Copy link
Member

I just ran this code on Julia 1.1.1 and it worked as expected.

@Codyk12
Copy link

Codyk12 commented Jun 11, 2019

I just ran this code on Julia 1.1.1 and it worked as expected.

K great, that's what I was getting at, the undef part was missing. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants