Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use of bitinformation(dim) #31

Closed
aaronspring opened this issue Mar 24, 2022 · 5 comments
Closed

use of bitinformation(dim) #31

aaronspring opened this issue Mar 24, 2022 · 5 comments

Comments

@aaronspring
Copy link

aaronspring commented Mar 24, 2022

I don't quite understand the dim argument in bitinformation and its implications. Can I just ignore it and use the default dim=1?

@test bi1[i] bi2[i] atol=1e-3
@test bi2[i] bi3[i] atol=1e-3
@test bi1[i] bi3[i] atol=1e-3
seems like dim only matters for sorted dimensions, i.e. dim doesnt matter on raw data.

Your example plots in https://doi.org/10.24433/CO.8682392.v1 are using dim=1 meaning longitude. I have data along dimensions longitude, latitude and time and somehow intuitively would run the analysis along time.

@milankl
Copy link
Owner

milankl commented Mar 24, 2022

That test is indeed confusing. As the array A is not sorted, every entry is independent of the next hence all those tests just check that the information is zero.

julia> using BitInformation
julia> A = rand(Float32,30,40,50);
julia> bi1 = bitinformation(A,dim=1);
julia> bi2 = bitinformation(A,dim=2);
julia> bi3 = bitinformation(A,dim=3);
julia> hcat(bi1,bi2,bi3)
32×3 Matrix{Float64}:
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
    

However, if you sort the array in a given dimension then you artificially introduce some information, which is highest in that dimension

julia> sort!(A,dims=1);
julia> bi1 = bitinformation(A,dim=1);
julia> bi2 = bitinformation(A,dim=2);
julia> bi3 = bitinformation(A,dim=3);
julia> hcat(bi1,bi2,bi3)
32×3 Matrix{Float64}:
 0.0          0.0          0.0
 0.0          0.0          0.0
 0.0          0.0          0.0
 0.0          0.0          0.0
 0.0          0.0          0.0
 0.0067747    0.00508132   0.00538892
 0.292094     0.182393     0.187531
 0.550684     0.265361     0.271625
 0.371526     0.114251     0.118072
 0.237596     0.0441321    0.0441709
                          
 0.0          0.0          9.3149e-5
 0.0          0.0          0.0
 0.0          0.0          0.0
 0.0          0.0          0.0
 0.0          0.0          0.000280003
 0.000749177  0.000946589  0.000850585
 0.00515332   0.00430802   0.00508684
 0.0233246    0.0177343    0.0185884
 0.061388     0.0458432    0.0484664

bi1 will have the highest information in the exponent/mantissa bits, but sorting along 1 dimension also influences the other (with smaller information though). The information in the last mantissa bits is due to the poor sampling of rand (see the randfloat function in JuliaRandom/RandomNumbers.jl as an alternative).

@milankl
Copy link
Owner

milankl commented Mar 24, 2022

I have data along dimensions longitude, latitude and time and somehow intuitively would run the analysis along time.

You can run the analysis along any dimension you like. You can also add the information. The first dimension is usually just the default because that's also how the data is layed out in memory/on disk. Things can change along different dimensions, depending on the resolution. Check the supplement of our paper for some examples.

@aaronspring
Copy link
Author

is it also possible to run bitinformation on all dimensions and does that make sense?

@milankl
Copy link
Owner

milankl commented Mar 24, 2022

Yes, that's the same as running it in all dimensions separately and averaging the information. As it's an arithmetic mean you'll end up in the situation that if the information is high in one dimension but low in another that you may cut off too many bits for that high-information dimension. So what I often just went for is using longitude alone. Rule of thumb that I found in our data is information is highest in longitude/time then latitude then vertical then ensemble. But that obviously depends on the spatio-temporal resolution...

@aaronspring
Copy link
Author

thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants