Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GNIP - Add table join functionality to GeoNode #1915

Closed
jj0hns0n opened this issue Jan 12, 2015 · 14 comments
Closed

GNIP - Add table join functionality to GeoNode #1915

jj0hns0n opened this issue Jan 12, 2015 · 14 comments
Labels
gnip A GeoNodeImprovementProcess Issue
Milestone

Comments

@jj0hns0n
Copy link
Contributor

Overview

I'd like to propose a new set of features for GeoNode 2.5+ that allow a user to upload tabular data to a GeoNode instance and have that data table joined with an existing GeoNode Layer to create a new layer. This work will be funded by @cga-harvard and I plan to work on it over the next few weeks before the 2.4 final sprint and hope to merge it into master after the 2.4 release.

The basic idea is to leverage PostgreSQL and PostGIS to provide this functionality and create materialized views and use GeoServer only to configure the materialized views as GeoServer layers using gsconfig.

GeoNode currently allows users to upload CSV files when the GeoServer importer is used, but this is only for CSV files that contain latitude and longitude fields.

This proposed functionality would be initially used to allow users to map things like statistics that are gathered at the zip code, fips code or census block level.

Proposed By:

@jj0hns0n (to be funded by @cga-harvard)

Assigned to release:

2.5+ (Should be merged to master after 2.4 final)

State:

For Initial Discussion

Motivation:

@cga-harvard has a requirement to allow users to upload tabular data into a GeoNode instance and Join these uploaded datasets to existing layers already in the GeoNode using matching fields.

This kind of functionality has long been available in Desktop GIS software and is a very basic and key GIS use case. Some links are included below that describe how this works in Desktop GIS.

Proposal:

It is proposed to create a new contrib module (potentially geonode.contrib.data) to provide this functionality. This module would contain the following:

  • A set of models
    • geonode.contrib.data.models.DataTable - Uploaded data tables - fields and datatypes
    • geonode.contrib.data.models.JoinLayer - Layers and Attributes available for joining - FK to geonode.layers.model.Layer and geonode.layers.models.Attributes
    • geonode.contrib.data.models.Join - Joins between the uploaded data tables and existing geonode layers, statistics about join etc
  • A set of views providing API endpoints
    • /data/ingest - Import/ingest a data table copying to postgres
    • /data/list-layers - Return list of layers and attributes available for joining and the
    • /data/join - Join an uploaded DataTable to an existing Layer
    • /data/stats - Return a set of statistics about the join (number of joined/unjoined records and potentially truncated list of unjoined fields)
    • /data/replace - Replace an existing DataTable with a new version updating the Joined Layer
    • /data/list - List joined tables/layers and their properties
  • A thorough set of integration tests to exercise this functionality with various different types and sets of data. This test suite should grow as the functionality is tested and issues found.
  • A set of basic templates and client code that can minimally exercise this functionality. It is expected that this will be minimal initially but extensible based on new use cases described by the user community and other stakeholders.

It is proposed to develop this functionality by using PostgreSQL and PostGIS directly to create the joined tables rather than relying on GeoServer for the data import. The basic technical workflow would be as follows:

Use cases:

  • As a user, I want to be able to upload a tabular data file in csv or excel format that contains a column with data that represents a geometry in an existing GeoNode Layer (i.e. zip code, fips code, admin boundary code etc) and I want a new GeoNode layer created that contains the geometry and attributes from the existing layer and the attributes from my tabular data.
  • As a user who already created a GeoNode layer by uploading a tabular data file and joining it to an existing layer, I want to be able to replace or update that data table with a new version and have the layer updated. If the table containing the joined geometries is updated, I want to be able to update my joined layer.
  • As a GeoNode administrator, I want to be able to curate a set of layers and attributes available for my users to join their tabular data to.
  • As an external developer, I want to be able to leverage this functionality via an API such that I can 'map' tabular data contained in another system using a GeoNode instance.

Issues:

  • Performance
  • Moving large datasets over the wire
  • Refreshing Data from External Systems

Testing:

As part of the implementation, a thorough and rigorous test suite will be developed that exercises this functionality using various sets of tabular data and geometry layers. It is expected that this test suite will be used to demonstrate bugs encountered with new datasets and to verify their fix. The rigour of this test suite will be the key to the success or failure of this module when used in the real world.

Alternatives

Do nothing. GeoNode does not currently provide this functionality. Users are currently encouraged to create joined layers using a desktop GIS and to upload them to a GeoNode. This is problematic insofar as it requires extra tools and steps that may be too complicated for a user who simply wants to upload and map some tabular data and also creates a duplicate copy of the geometry data which is both inefficient from a storage standpoint and also implies that these derived data layers will not reflect changes to the original geometry data.

Feedback

@jj0hns0n jj0hns0n added gnip A GeoNodeImprovementProcess Issue feature A new feature to be added to the codebase enhancement labels Jan 12, 2015
@jj0hns0n jj0hns0n added this to the 2.5.x milestone Jan 12, 2015
@jj0hns0n jj0hns0n changed the title Add table join functionality to GeoNode GNIP - Add table join functionality to GeoNode Jan 12, 2015
@simod
Copy link
Member

simod commented Jan 13, 2015

+1

@capooti
Copy link
Member

capooti commented Jan 13, 2015

+1, this would be really useful

This was referenced Jan 13, 2015
@state-hiu
Copy link

+1, this would be really great for geocoding global "by country" datasets, such as indicators, foreign assistance, programs, etc, against country polygons.

Also, I would add a UI for identifying unmatched rows somewhere even if the errors need to be corrected offline.

We should also look into leveraging Foreign Data Wrappers for OGR (https://github.com/pramsey/pgsql-ogr-fdw) to allow users to link a local stats table to a remote polygon file (or vice versa).

@jj0hns0n
Copy link
Contributor Author

The plan is to have the status endpoint return a potentially truncated list
of unmatched rows for inspection offline.

On Tue, Jan 13, 2015 at 9:27 AM, Humanitarian Information Unit <
[email protected]> wrote:

+1, this would be really great for geocoding global "by country" datasets,
such as indicators, foreign assistance, programs, etc, against country
polygons. Also, I would add a UI for identifying unmatched rows somewhere
even if the errors need to be corrected offline.


Reply to this email directly or view it on GitHub
#1915 (comment).

@JJediny
Copy link
Member

JJediny commented Jan 24, 2015

+1 on functionality, as this opens the door to alot of use cases:

  • Appending geocoded data
  • Integrating data from remote/external systems using standard export queries with manual & scheduled, or externally triggered event refresh
  • Saving space by reusing geometry for multiple sets of attributes (Demographics being an obvious ex)

Main concern

The concern as @state-hiu mentioned would be how failed ref (no matches) or conflicts (ex 1 geometry 3 rows of attributes) get resolved as clean one-to-one joins by attribute will be the rare case if this is to be worth its weight in code. Would it support a user in session manually relating fields?
The Middleware UI could make or break the feature and needs to be the focus as this is a feature that could snowball, keeping it simple while meeting multiple use cases is going to be the hardest part to implement/support.

Smaller concerns

  • Unicode / lost of attribute type (string/date/booleen/float/int) could cause headache for styling and things like column length for shapefile on export.

Overall think this approach could solve alot of the two way data binding needs projects have for "linking/syncing" existing remote systems/services as it could simplify approach to simple export/ingest set to a scheduled refresh vs more complex/dynamic syncing approaches (i.e. via REST API/Services)

@jj0hns0n
Copy link
Contributor Author

@capooti do you think we can work together to get this ready for the first 2.5 in about a month?

@capooti
Copy link
Member

capooti commented Jun 15, 2016

I am pretty scheduled but I will try to work on this.

On Wed, Jun 15, 2016 at 1:42 PM, Jeffrey Johnson [email protected]
wrote:

@capooti https://github.com/capooti do you think we can work together
to get this ready for the first 2.5 in about a month?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1915 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAJDnLZEMYJUam7wQ1lIYkVZ64VkqRYPks5qMDmNgaJpZM4DRWS5
.

Paolo Corti
Geospatial software developer
web: http://www.paolocorti.net
twitter: @capooti
skype: capooti

@jj0hns0n jj0hns0n modified the milestones: 2.7, 2.5 Aug 21, 2016
@jondoig
Copy link
Contributor

jondoig commented Sep 1, 2016

+1 this is exactly what we need for uploading ABS stats and joining to existing geometry.

Suggest initially it handles join errors simply by reporting them (with row number and key value) as they are encountered, but continues on regardless (using only the first row of 'conflicts' -- multiple matches). Show option to cancel the join while in progress, and save or delete the resulting layer on completion. Present error list in a table so it can be easily scraped back into Excel etc for remediation.

@jj0hns0n
Copy link
Contributor Author

jj0hns0n commented Sep 1, 2016

Jonathan, you can try out my branch, we didn't get it in for 2.5. It still
needs a lot of work. I'll pick it back up someday soon.

On Wednesday, August 31, 2016, Jonathan Doig [email protected]
wrote:

+1 this is exactly what we need for uploading ABS http://abs.gov.au
stats and joining to existing geometry.

Suggest initially it handles join errors simply by reporting them (with
row number and key value) as they are encountered, but continues on
regardless (using only the first row of 'conflicts' -- multiple matches).
Show option to cancel the join while in progress, and save or delete the
resulting layer on completion. Present error list in a table so it can be
easily scraped back into Excel etc for remediation.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1915 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAe43q9XtM1Hw52vpHZ98O8JIUg7bRCZks5qlirngaJpZM4DRWS5
.

@cristianzamar
Copy link
Contributor

+1 this would be an epic feature!! :)

@t-book
Copy link
Contributor

t-book commented Aug 22, 2017

Jonathan, you can try out my branch, we didn't get it in for 2.5. It still
needs a lot of work. I'll pick it back up someday soon.

@jj0hns0n: where can I find your current branch ?

@mattmalcher
Copy link

+1, would really appreciate this feature :)

@jondoig
Copy link
Contributor

jondoig commented Nov 10, 2018

@afabiani why closed?

@afabiani
Copy link
Member

4 years old, no real interest expressed and the proponent abandoned the project since almost 2 years. No one will work on this.
This is not a GNIP, this could be a feature request at most.

As PSC we decided to remote older issues and obsolete GNIPs.

@afabiani afabiani removed the feature A new feature to be added to the codebase label Aug 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gnip A GeoNodeImprovementProcess Issue
Projects
None yet
Development

No branches or pull requests

9 participants