Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitinformation of masked arrays #30

Merged
merged 9 commits into from
Mar 23, 2022
Merged

Bitinformation of masked arrays #30

merged 9 commits into from
Mar 23, 2022

Conversation

milankl
Copy link
Owner

@milankl milankl commented Mar 21, 2022

No description provided.

@aaronspring
Copy link

Great. I can test this PR next week.

@milankl
Copy link
Owner Author

milankl commented Mar 22, 2022

The last commit 5a7b6fb adds a simple test that for a given array and a mask that has all entries unmasked with bitwise real information is identical. Before I merge this, @aaronspring could you install this branch via ] add https://github.com/milankl/BitInformation.jl#mk/masked and report back in #29 how your results change with and without mask provided?

@milankl milankl merged commit 05bd9ef into main Mar 23, 2022
@milankl milankl deleted the mk/masked branch March 23, 2022 12:55
in adjacent entries in A along dimension `dim` (optional keyword). Array `A` is masked through
trues in entries of the mask `mask`. Masked elements are ignored in the bitwise information calculation."""
function bitinformation(A::AbstractArray{T},
mask::BitArray;
Copy link

@aaronspring aaronspring Mar 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milankl Would it be possible for bitinformation to guess the mask based on a masking value, such as -9.e+33?

Or I'd appreciate a short help to how to create mask. What the new julia user tries and fails is mask = ncfile.vars[varnames[1]][:,:,:,:] == -9.e+33

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it would be possible to guess the mask, I don't want to do that, as for the moment the number format T can be any bitstype. Hence making the assumption that -9e33 is a mask may work well in one format (e.g. Float64) but not necessarily in others (integer, posits, etc., or a NaN in Float16 isn't necessarily a NaN in BFloat16 etc.). We could define bitinformation(A::AbstractArray{T},mask::T) which creates a mask based on the bit pattern in the scalar mask if that would be helpful. In general, Julia creates a BitArray for any broadcasted comparison, e.g.

julia> A = rand(3,3)
3×3 Matrix{Float64}:
 0.119336  0.127636  0.808542
 0.439229  0.388266  0.52312
 0.899243  0.992992  0.549393

julia> A .< 0.5
3×3 BitMatrix:
 1  1  0
 1  1  0
 0  0  0

what you are missing is the dot, i.e. .== instead of == broadcasting in Julia is a bit more conservative than in python...

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bitinformation(::Array{T},masked_value::T) is now defined in #33, such that you can do bitinformation(A,-9f33) and a mask is created internally for all values that are floating-point identical to the second argument.

@aaronspring
Copy link

@milankl I find differences whether using mask or not, specially the first bits and within 99-100% information:

dim: 4 time
image from https://gist.github.com/aaronspring/5de0bc6be5a8d547f3503ff8b1aef8c6

dim: 1 x
image from https://gist.github.com/aaronspring/7b0675e36467e5c647f2fe3f546d4bf5

dim analysis:
image from https://gist.github.com/aaronspring/52662dd885eebfd6ce4b88939be016c6

@milankl
Copy link
Owner Author

milankl commented Mar 28, 2022

Thanks Aaron, that looks awesome. Good to see that all these information patches in the exponent bits disappear! For dissicos could you convert your arrays to signed exponent bits? It looks like there's a bunch of exponent bits that simultaneously flips over (which happens when your data covers a range across floating-point 2)

julia> A = rand(Float32,3,3)
3×3 Matrix{Float32}:
 0.408687   0.922863  0.0622634
 0.838222   0.947521  0.141393
 0.0944574  0.991588  0.576514

julia> signed_exponent(A)
3×3 Matrix{Float32}:
 13.078    7.3829   127.515
  6.70577  7.58017   18.0983
 48.3622   7.93271    4.61211

It looks wrong, because Julia will interpret the exponent bits still as being biased (the information that the exponent bits are now to be interpreted differently is not stored), but you can always check that nothing went wrong by applying the inverse biased_exponent. If you use signed_exponent as a preprocessing step, note that also your mask will change

julia> signed_exponent([-9e-33])
1-element Vector{Float64}:
 -4.739053125085073e32

@aaronspring
Copy link

signed_exponent added makes the black bar in dissicos disappear
image

@milankl
Copy link
Owner Author

milankl commented Mar 28, 2022

Can we move this discussion to #29 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants