Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transposed rasters #37

Closed
visr opened this issue Apr 18, 2018 · 16 comments · Fixed by #93
Closed

transposed rasters #37

visr opened this issue Apr 18, 2018 · 16 comments · Fixed by #93

Comments

@visr
Copy link
Collaborator

visr commented Apr 18, 2018

If a complete band is read into an array, the x and y dimensions are flipped:

using ArchGDAL

filepath = "dem.tif"

arr = ArchGDAL.registerdrivers() do
    ArchGDAL.read(filepath) do dataset
        ArchGDAL.read(dataset, 1)
    end
end
summary(arr) # => 97×33 Array{UInt8,2}

run(`gdalinfo $filepath`)

Size is 97, 33 # this is ncol, nrow
Band 1 Block=97x1 Type=Byte, ColorInterp=Undefined # blocks are single rows

This is because GDAL is row-based and Julia column-based.
Some options regarding this:

  • Keep as is, warn in the docs that users must take this into account
  • Call permutedims (docs) before returning the array. Makes a copy of the whole dataset.
  • Return a PermutedDimsArray (docs). This does not create a copy, but a permuted view.

I haven't really worked with PermutedDimsArray yet, but to me it seems like the best choice right now. Thoughts?

@yeesian
Copy link
Owner

yeesian commented Apr 18, 2018

I'm also in preference of having the third option as the default. Should we

  • rename read to read_colbase
  • add read_rowbase(...) = PermutedDimsArray(read_colbase(...), (2,1))
  • add the alias read = read_rowbase

?

@visr
Copy link
Collaborator Author

visr commented Apr 18, 2018

I'm not sure whether it is worth the extra code/maintenance/complexity to have a read_rowbase and read_colbase distinction.

Other options are read defaults to PermutedDimsArray and a keyword option to get permutedims, or the other way around. I just saw JuliaIO/ImageMagick.jl#103, where they opted for the latter.

But for geospatial rasters small arrays may not be as common? And if we want to write the results back to GDAL again, there is no need to copy a second time if we use PermutedDimsArray.

@yeesian
Copy link
Owner

yeesian commented Apr 18, 2018

Let's just go with PermutedDimsArray for now then. Can always add in more options later when there are requests for it.

@visr visr mentioned this issue Jul 17, 2018
3 tasks
@evetion
Copy link
Collaborator

evetion commented Sep 26, 2018

I agree with PermutedDimsArray as long as there's a keyword for disabling this. I assume that the imread functionality will not return a view, especially because the color dimension is often the first dimension, instead of the last in GDAL.

@yeesian
Copy link
Owner

yeesian commented Sep 26, 2018

Sorry I'm a little overwhelmed by work at the moment. But I'll be happy to take PRs

@visr
Copy link
Collaborator Author

visr commented Sep 26, 2018

Yeah no pressure! Was just the result of a little private discussion we had about the pros and cons of doing this :)

@yeesian
Copy link
Owner

yeesian commented Sep 26, 2018

Was just the result of a little private discussion we had about the pros and cons of doing this :)

Anything we should note? Might this be related to the discussion on AbstractRasters in GeoInterface?

@visr
Copy link
Collaborator Author

visr commented Sep 27, 2018

No I don't think it directly relates to that discussion. See also #57 (comment)

@evetion
Copy link
Collaborator

evetion commented Oct 3, 2018

After a discussion in #57, @yeesian, summarizing my viewpoint:

That sounds reasonable to me. To state what I understand of your comment: either the user

(a) deals with the "raw data" (through the current read()/write() methods) and does what they want with it (we can document permuteddims view for them), or
(b) views it as an "image" object similar to how it might be dealt with in the JuliaImages organization (through ongoing work in the images branch to support imread()/imwrite()). This handles issues like permuting the array, normalizing the values, and identifying the color channels.

Being stuck in the middle ground between (a) and (b) sounds like an easy way to cause confusion for users.

I closed the PR. I would opt for a new PR, implementing the documentation in discussed in (a) and leave it at that.

We can leave this issue open for a week or two(? @visr), so others can state their views.

@visr
Copy link
Collaborator Author

visr commented Oct 3, 2018

Documenting the order of the dimensions is definitely a good idea. I also agree a kind of in-between solution might also cause more confusion, it's best to have a consistent and easy to explain interface.

It still bothers me a bit that with (col, row, band) we will have a different order than users may expect based on most other julia packages, but also don't see an easy way out, other than permuting on every io. If we don't permute, users will likely have to do it themselves depending on what they are doing. One way out, but not quite there yet is explicitly naming the axes NamedIndexing.jl or AxisArrays.jl. That way we can at least always know what the order is. Knowing the order will probably be important for JuliaGeo/GeoInterface.jl#16.

No direct rush for closing this issue, though perhaps good not to hold things up based on this, and just register ArchGDAL. If we change our mind about permuting we can do a breaking release. But that's probably still better than breaking on master of an unregistered package that quite a few are already using, so people cannot pin it easily.

@yeesian
Copy link
Owner

yeesian commented Oct 3, 2018

If we change our mind about permuting we can do a breaking release. But that's probably still better than breaking on master of an unregistered package that quite a few are already using, so people cannot pin it easily.

It might be a good idea to have a package (building upon GDAL/ArchGDAL) that plays with all these ideas (including the images stuff) too. If we converge on ideas worth emulating, we can then bring them upstream.

@visr
Copy link
Collaborator Author

visr commented Oct 3, 2018

Yeah indeed. Perhaps we can use https://github.com/evetion/GeoRasters.jl for that.

@evetion
Copy link
Collaborator

evetion commented Oct 7, 2018

Added documentation in #60

@yeesian
Copy link
Owner

yeesian commented Oct 7, 2018

Closed by #60

@AmebaBrain
Copy link

@evetion you closed this issue with comment below in rasters.md

The array returned by read has (rows, cols, bands) dimensions.

But actual return size seems to be still (col, row)

import ArchGDAL
const AG = ArchGDAL

filepath = "../data/geo/world.tif"

AG.registerdrivers() do
    AG.read(filepath) do dataset

        band = AG.getband(dataset, 1)
        println( "height (rows): " * repr(AG.height(band)))
        println("width (cols): " * repr(AG.width(band)))

        # returned shape is (col, row) and not (rows, cols, bands) as stated in docs
        data = AG.read(band)
        println("data size by reading band: " * repr(size(data)))

        data = AG.read(dataset, 1)
        println("data size by reading dataset: " * repr(size(data)))
    end
end

output

height (rows): 1024
width (cols): 2048
data size by reading band: (2048, 1024)
data size by reading dataset: (2048, 1024)

@evetion
Copy link
Collaborator

evetion commented Oct 21, 2019

Thanks for checking and the feedback. I've just checked now and you're right. I'll make a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants