Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

design & structure #1

Closed
dcooley opened this issue Apr 27, 2020 · 11 comments
Closed

design & structure #1

dcooley opened this issue Apr 27, 2020 · 11 comments

Comments

@dcooley
Copy link
Owner

dcooley commented Apr 27, 2020

  • C++ headers
  • independant of R classes and attributes
  • Manipulate data frame & matrices into geometric structures
  • convert geometric structures into data.frames

Conventinoal Geometric Structures

  • POINT - vector
  • MULTIPOINT - Matrix
  • LINESTRING - Matrix
  • MULTILINESTRING - List[ Matrix ]
  • POLYGON - List[ Matrix ]
  • MULTIPOLYGON - List[ List[ Matrix ] ]

But this library is not limiting which geometry you assign to which structure

@paleolimbot
Copy link

I'd like to generate some of these in {wk} from WKB and WKT...I think you'll need a tiny bit of attribute info. The required info (from the WKB/WKT perspective) is:

  • simpleGeometryType (preferably as an integer so nobody has to strcmp()...these integers are pretty universal
  • hasZ
  • hasM
  • hasSRID
  • The SRID (if there is one).

I know this botches the "no attributes" thing, but if you can't represent this info then reading EWKB/EWKT will require modifying your spec, which think you don't want.

A few other thoughts from the WKX perspective:

  • sf uses a vector for POINT, but a one-row matrix would be easier to deal with (and has the added benefit that you can represent an empty point with a zero-row matrix)
  • Consider representing a MULITPOINT as a list() of POINT instead of a matrix. Less efficient, but way easier to parse (in WKB and WKT, multi* geometries and collections are represented the same: basically a list() containing the simple geometries).

@dcooley
Copy link
Owner Author

dcooley commented Apr 27, 2020

sf uses a vector for POINT, but a one-row matrix would be easier to deal with (and has the added benefit that you can represent an empty point with a zero-row matrix)

Consider representing a MULITPOINT as a list() of POINT instead of a matrix. Less efficient, but way easier to parse (in WKB and WKT, multi* geometries and collections are represented the same: basically a list() containing the simple geometries).

I think these points highlight my intention of this library. I don't want to constrain anyone to use a matrix for a linestring, or list of matrices for a polygon. I only want to provide tools to build these structures, then each user can define what they mean.

So if you want to use a List of one-row matrices for a multipoint you can. Someone else might want to represent this as a completely different type.

I've only listed the "conventional" structures as a starting point only.

@dcooley
Copy link
Owner Author

dcooley commented Apr 27, 2020

maybe "geometries" isn't the right name for the package.

@mdsumner
Copy link

mdsumner commented Apr 28, 2020

I came back to add that "this seems like a super powered split()" - and your discussion above highlights that for me ;)

split() is a bottleneck in silicate (I'm still surprised by this, but it's faster to split a matrix by a flat vector and then restructure as matrices, so what I see here is both an improvement for that - the utility of split, sped up and generalized - and the need for me to revisit what I've done in silicate almost entirely)

The two key steps in silicate are 1) splitting a data frame (into paths or segments or triangles) 2) densifying the coordinates (finding duplicates in x/y, and identifying them by splitting other columns out - this is a part of nest(), what I call unjoin(), the same as and a bit slower than dm::decompose_table() - so you could speed up nest() and unnest() with this too).

I see split for a data frame to list of matrices here, and I'd want to add split data frame or matrix to list of df as well (I reckon you've got that on the todo?).

@dcooley
Copy link
Owner Author

dcooley commented Apr 28, 2020

I'd want to add split data frame or matrix to list of df as well (I reckon you've got that on the todo?).

It's not currently a TODO. What is the requirement/benefit/use-case for having a data.frame inside the lists as opposed to matrices?

@mdsumner
Copy link

I'll add - the use for split() in silicate is not to put geometry in list-matrices, it puts indexes-of-vertices into those. That's kind of the whole deal, the "geometry" is in one table (and maybe not in memory).

@mdsumner
Copy link

I'd want to add split data frame or matrix to list of df as well (I reckon you've got that on the todo?).

It's not currently a TODO. What is the requirement/benefit/use-case for having a data.frame inside the lists as opposed to matrices?

A fast split.data.frame(). You can store mixed types. I'm on the fence I guess, it's a possible generalization.

@dcooley dcooley mentioned this issue Apr 29, 2020
2 tasks
@dcooley
Copy link
Owner Author

dcooley commented Jul 14, 2020

regarding 'split', I've made the make_geometries() accept vectors, matrices and data.frames (lists). And it will split the data based on any number of "id" vectors.

@mdsumner
Copy link

thanks! Getting into this now - the geometries.Rmd needs an update fwiw, do you want micro feedback like this?

lines 112:122

cppFunction(
  depends = "geometries"
  , includes = '#include "geometries/geometries.hpp"'
  , code = '
    SEXP my_shape( DataFrame x, IntegerVector ids, IntegerVector geometry_cols, List attributes) {
      return geometries::make_geometries( x, ids, geometry_cols, attributes);
    }
  '
)

my_shape( df, c(0,1), c(2,3), list())  

@dcooley
Copy link
Owner Author

dcooley commented Jul 15, 2020

oh yeah - all the .Rmds are now out of date. I need to re-write them all. But thanks for pointing it out.

@mdsumner
Copy link

sorry for that - I was a bit lost, out of the C++ groove for a while.

Gee this is nice, split() but blazing fast - it's nice being able to keep or discard identifying or "geometry" columns completely indepenedently. Super general!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants