-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compatibility with sf library #2273
Comments
This comment has been minimized.
This comment has been minimized.
Reproducible example please. |
#1310 is related |
Reproducible Example
Now when we try setting the
|
For the record, I've also created an issue in the sf github page #428 because the compatibility between the two libraries might request a bit of collaboration from both sides. |
If you lose the library(sf)
#> Linking to GEOS 3.5.1, GDAL 2.1.3, proj.4 4.9.2, lwgeom 2.2.1 r14555
library(data.table)
nc <- st_read(system.file("shape/nc.shp", package = "sf"))
#> Reading layer `nc' from data source `/usr/local/lib/R/site-library/sf/shape/nc.shp' (...)
#> Simple feature collection with 100 features and 14 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID): 4267
#> proj4string: +proj=longlat +datum=NAD27 +no_defs
nc <- setDT(nc)
nc[AREA > 0.1, st_area(geometry)]
#> Units: m^2
#> [1] 1137388604 1423489919 1520740530 1179347051 1232769242 1136287383
#> [...]
#> [61] 1264050755 2288682896 2181174167 2450241815 2165843695
nc[AREA > 0.1, sum(st_area(st_union(geometry))), by = SID74]
#> SID74 V1
#> 1: 1 3598847644 m^2
#> 2: 5 11299175600 m^2
#> [...]
#> 20: 15 3850824500 m^2
#> 21: 29 1978619669 m^2
#> 22: 31 2439553215 m^2
#> SID74 V1 But returning geometries is a problem nc[, st_union(geometry), by = SID74]
#> SID74
#> 1: 1
#> [...]
#> 83: 31
#> SID74
#> V1
#> 1: <list>
#> 2: <list>
[...]
#> 83: -78.86451,-78.91947,-78.95074,-78.97536,-79.00224,-79.00642, 34.47720, 34.45364, 34.44938, 34.39917, 34.38804, 34.36627,
#> V1 Another path is to |
A quick note that @SymbolixAU has been working on a project creating a |
FYI, the reason I'm extending the more info/background on what I'm doing |
The problem @etiennebr is describing is because aggregation of geometry can lead to two different kinds of geometries (POLYGON and MULTIPOLYGON in this case). Try the following: nc <- st_read(system.file("shape/nc.shp",package="sf"))
nc_DT <- as.data.table(nc)
nc %>% group_by(SID74) %>% summarise(geom = st_union(geometry))
nc_DT[,st_union(geometry),by=SID74][,table(SID74)] # indices where frequency > 1 are multipolygon geometries! I still don't understand why data.table shows 83 rows for |
I have realized my error and I know that I have to do nc %>% group_by(SID74) %>% summarise(geom=st_union(geometry)) %>% plot # works!
plot(nc_DT[, .(st_union(geometry)), by=.(SID74)]) # does not work...some weird error! |
@vlulla try setDT(nc)[, .(st_union(geometry)), by=.(SID74)] %>% sf::st_as_sf() %>% plot |
This comment has been minimized.
This comment has been minimized.
Am I right in thinking it would still be nice if
|
I just tried setDT(nc)
attr(nc, "class") <- c(attr(nc, "class"), "sf") and it seems to cause more issues, so not sure if there's any immediate benefit in retaining the At the moment I'm happy with workarounds. There's lots which can be done inside the |
Instead of putting |
Where in https://github.com/r-spatial/sf/blob/master/R/sf.R#L299-L335 should this At one time (when #### print.sf <- function(x, ..., n = ifelse(options("max.print")[[1]] == 99999, 20, options("max.print")[[1]])) {
#### geoms = which(vapply(x,function(col) inherits(col,"sfc"), TRUE))
#### nf = length(x) - length(geoms)
#### app = paste("and", nf, ifelse(nf == 1, "field", "fields"))
#### if (any(!is.na(st_agr(x))))
#### app = paste0(app,"\n","Attribute-geometry relationship: ", sf:::summarize_agr(x))
#### if (length(geoms) > 1)
#### app = paste0(app,"\n","Active geometry column: ", attr(x, "sf_column"))
#### print(st_geometry(x),n=0,what="Simple feature collection with",append=app)
#### if(is.data.table(x)) {
#### ## data.table:::print.data.table(x, ...)
#### NextMethod()
#### } else {
#### print.data.frame(x, ...)
#### }
#### ## print.data.frame(x, ...)
#### invisible(x)
#### } |
The lines around 318 in class(x) = setdiff(class(x), "sf") # one step down
x = if (missing(j)) {
if (nargs == 2) # `[`(x,i)
x[i] # do sth else for tbl?
else
x[i, , drop = drop]
} else
x[i, j, drop = drop] You don't need to remove |
Since last Matt's reply almost 2 years ago there was no feedback. Also previously mentioned issues were addressed in comments by workarounds or suggestions on how to improve integration on |
I personally am a massive user of |
Maybe this thread helps paleolimbot/wk#12. It seems that (1) |
As Matt noted above:
With @MichaelChirico's recent PR to enable custom printing for list column types (#3414), I'm wondering whether we could turn some special cases like library(data.table)
library(sf)
#> Linking to GEOS 3.9.1, GDAL 3.3.1, PROJ 8.0.1
nc = st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
setDT(nc)
format_list_item.sfg = function(x, ...) format(x)
nc[1:5, .(NAME, geometry)]
#> NAME geometry
#> 1: Ashe MULTIPOLYGON (((-81.47276 3...
#> 2: Alleghany MULTIPOLYGON (((-81.23989 3...
#> 3: Surry MULTIPOLYGON (((-80.45634 3...
#> 4: Currituck MULTIPOLYGON (((-76.00897 3...
#> 5: Northampton MULTIPOLYGON (((-77.21767 3... Created on 2021-08-30 by the reprex package (v2.0.1) |
@grantmcdermott Based on your nice suggestion, I've prepared a PR for the sf package that will produce pretty-printed |
Great, thanks @JoshOBrien! (Aside: shouldn't the PR be on the data.table side, though?) |
@grantmcdermott That's a great question, and I honestly don't know the answer. FTR, here is the commit I would make into a PR if I do this via sf. Following that commit, an library(sf)
library(data.table)
nc <- st_read(system.file("shape/nc.shp", package = "sf"))
DT <- data.table(nc)
DT[1:3, .(NAME, FIPS, geometry)]
## NAME FIPS geometry
## 1: Ashe 37009 MULTIPOLYGON (((-81.47276 3...
## 2: Alleghany 37005 MULTIPOLYGON (((-81.23989 3...
## 3: Surry 37171 MULTIPOLYGON (((-80.45634 3... @jangorecki Do you have any opinion about whether I should submit a PR adding a |
I think it may fit better to sf package, then it may be reused in other packages as well. |
@jangorecki Sounds good. I'll drop another note here once data.table v1.14.4 is out if my PR then gets pulled into the sf package. |
Certainly my intention when writing the PR was that downstream packages would define methods for their own classes. If they are unwilling to merge we could do so here, since IINM we don't need to add any dependency, just to create an I had a look at your PR, it looks quite elaborate for just adding a method! I am assuming because of the invisible-line dependency issue. But the meat of your PR is very simple:
Maybe I'm not great with S3 registration stuff, but if this sounds easy to you, a PR would be great. |
Hmmm. Looks I'm in the minority here, thinking that an internal format_list_item.sfg = function(x, ...) format(x) Similarly, I don't think this method would change the behaviour for any other data frame(ish) object on the Final argument for keeping this on the data.table side: Converting an sf object to a data.table must not only be done explicitly, but also necessitates a slight change in workflow. (As documented in this thread, we need to refer to the |
Now that I've thought about this a bit more, I think I'm with @grantmcdermott on this one. It'd be really nice to be able to sidestep the elaborate hoops I go through in that sf-side PR just to be able to register the method without adding data.table to sf as an Imports or Depends. As both @grantmcdermott and @MichaelChirico note, adding the method on the data.table side would require just three lines of code and no additional dependencies. FTR, I looked into @MichaelChirico suggestion that we could possibly modify the code of ## Alternative implementation, that uses each list element's class to dispatch a format method
format_list_item.default <- function (x, ...)
{
exists_format_method <- function(x) {
!(is.null(getS3method("format", class = x, optional = TRUE)))
}
if (is.null(x))
""
else if (is.atomic(x) || inherits(x, "formula"))
paste(c(format(head(x, 6L), ...), if (length(x) > 6L) "..."),
collapse = ",")
else if (any(sapply(class(x), exists_format_method)))
format(x)
else
paste0("<", class(x)[1L], paste_dims(x), ">")
} |
Thanks for investigating Josh. point taken... I am only trying to avoid a maintenance headache if dozens of formatters are eventually requested in data.table. the most scalable thing seems to me to be keeping a list of classes where the format method plays well. but I think that can be kicked down the road a bit. I would be fine with a simple PR to data.table as described (with NEWS and mention it in the .Rd) |
By adding `format_list_item.sfg()`, a method for the recently added `format_list_item()` S3 generic, we can ensure that any simple feature geometry columns (with elements of class `"sfg"`) in a data.table will be pretty printed using the **sf** package's `format.sfg()`. To see what that looks like, see the example below ```r library(sf) library(data.table) nc <- st_read(system.file("shape/nc.shp", package = "sf")) DT <- data.table(nc) DT[1:3, .(NAME, FIPS, geometry)] ## NAME FIPS geometry ## 1: Ashe 37009 MULTIPOLYGON (((-81.47276 3... ## 2: Alleghany 37005 MULTIPOLYGON (((-81.23989 3... ## 3: Surry 37171 MULTIPOLYGON (((-80.45634 3... ``` FR Rdatatable#2273, starting [here](Rdatatable#2273 (comment)), includes a discussion of the pros and cons of adding this method, for a data type defined in the **sf** package, to **data.table**.
@MichaelChirico That makes total sense, and I like your idea of, in the longer term, using a list to record classes with format methods that produce reasonable In the PR I just pushed, I added this as a new item in NEWS, but could also see folding it into NEWS item 17, which announced the Also, I didn't add an example to the |
By adding `format_list_item.sfg()`, a method for the recently added `format_list_item()` S3 generic, we can ensure that any simple feature geometry columns (with elements of class `"sfg"`) in a data.table will be pretty printed using the **sf** package's `format.sfg()`. To see what that looks like, see the example below ```r library(sf) library(data.table) nc <- st_read(system.file("shape/nc.shp", package = "sf")) DT <- data.table(nc) DT[1:3, .(NAME, FIPS, geometry)] ## NAME FIPS geometry ## 1: Ashe 37009 MULTIPOLYGON (((-81.47276 3... ## 2: Alleghany 37005 MULTIPOLYGON (((-81.23989 3... ## 3: Surry 37171 MULTIPOLYGON (((-80.45634 3... ``` FR Rdatatable#2273, starting [here](Rdatatable#2273 (comment)), includes a discussion of the pros and cons of adding this method, for a data type defined in the **sf** package, to **data.table**.
By adding `format_list_item.sfg()`, a method for the recently added `format_list_item()` S3 generic, we can ensure that any simple feature geometry columns (with elements of class `"sfg"`) in a data.table will be pretty printed using the **sf** package's `format.sfg()`. To see what that looks like, see the example below ```r library(sf) library(data.table) nc <- st_read(system.file("shape/nc.shp", package = "sf")) DT <- data.table(nc) DT[1:3, .(NAME, FIPS, geometry)] ## NAME FIPS geometry ## 1: Ashe 37009 MULTIPOLYGON (((-81.47276 3... ## 2: Alleghany 37005 MULTIPOLYGON (((-81.23989 3... ## 3: Surry 37171 MULTIPOLYGON (((-80.45634 3... ``` FR Rdatatable#2273, starting [here](Rdatatable#2273 (comment)), includes a discussion of the pros and cons of adding this method, for a data type defined in the **sf** package, to **data.table**.
I wonder if it would be possible to make
data.table
compatible with the newsf
library. The librarysf
is promising to be a game changer for spatial analysis inR
so it sounds like good idea to bring together the power of both libraries.Currently, the class of an
sf
object is"sf" "data.frame"
and it brings a column namedgeometry
of class"sfc_MULTIPOLYGON" "sfc"
.Right now, I think the main incompatibility between dt and sf is that when we convert an
sf
data.frame into adata.table
usingsetDT()
, the geometry column is ruined and the object is not recognised anymore as ansf
class.I'm sure there are other points to take into account when making these two great libraries together so I just wanted to put the ball rolling if someone has not done this before.
The text was updated successfully, but these errors were encountered: