Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/#1113 #1344

Merged
merged 5 commits into from
Mar 30, 2017
Merged

Bugfix/#1113 #1344

merged 5 commits into from
Mar 30, 2017

Conversation

alukach
Copy link
Contributor

@alukach alukach commented Mar 24, 2017

Proposed changes in this pull request

The core idea of this PR was to resolve #1113, supporting the upload of empty spatial data. We already support the idea of SpatialUnit instances with geometry values of None, however our system throws an error when WKT representing empty AKA none data is uploaded (e.g. POLYGON EMPTY).

The initial fix is simple: don't run reassign_spatial_geometry() on the SpatialUnit instance in the pre_save signal handler if the instance has an empty geometry (checked via geometry.empty == True).

However, when making this so I ran into a bug where libgeos fails to serialize WKT of POLYGON EMPTY into the correct EWKB format (used by PostGIS). This bug has been fixed in libgeos version 3.6.1, however we're using 3.4.2 (installed automatically via apt-get as a dependency of libgdal-dev). Django 1.11 will have a fix to work around this issue, however that version is still in release-candidate state and we're using 1.10.4. To work around this, I've put in a hacky solution of overwriting an empty Polygon instance with None in the SpatialUnit pre_save handler (mimicking the logic used in the 1.11 work-around).

We could store any empty geometries as None, however I prefer to keep data as close to its initially uploaded data as possible. If others disagree with this rationale, feel free to let me know. Running the following code, I was able to determine which WKT had supported empty values:

from django.contrib.gis.geos import GEOSGeometry


types = (
    "Geometry",
    "Point",
    "LineString",
    "Polygon",
    "MultiPoint",
    "MultiLineString",
    "MultiPolygon",
    "GeometryCollection",
    "CircularString",
    "CompoundCurve",
    "CurvePolygon",
    "MultiCurve",
    "MultiSurface",
    "Curve",
    "Surface",
    "PolyhedralSurface",
    "TIN",
    "Triangle",
)

SUCCESS = []
FAIL = []
p_id = Project.objects.first().id
for t in types:
    empty_str = '{} EMPTY'.format(t.upper())
    try:
        geo = GEOSGeometry(empty_str)
    except Exception as e:
        FAIL.append("{}: Failed to instantiate".format(empty_str))
        continue

    try:
        su = SpatialUnit.objects.create(geometry=geo, project_id=p_id)
        SUCCESS.append("{}: {}".format(empty_str, su.geometry))
    except Exception as e:
        FAIL.append("{}: {}".format(empty_str, e))

print("Supported:")
for t in SUCCESS:
    print("  {}".format(t))

print("Not Supported:")
for t in FAIL:
    print("  {}".format(t))
:--
Supported:
  POINT EMPTY: SRID=4326;POINT EMPTY
  LINESTRING EMPTY: SRID=4326;LINESTRING EMPTY
  POLYGON EMPTY: None  # <- Well, this is supported in WKT but not by PostGIS, as explained above
  MULTIPOINT EMPTY: SRID=4326;MULTIPOINT EMPTY
  MULTILINESTRING EMPTY: SRID=4326;MULTILINESTRING EMPTY
  MULTIPOLYGON EMPTY: SRID=4326;MULTIPOLYGON EMPTY
  GEOMETRYCOLLECTION EMPTY: SRID=4326;GEOMETRYCOLLECTION EMPTY
Not Supported:
  GEOMETRY EMPTY: Failed to instantiate
  CIRCULARSTRING EMPTY: Failed to instantiate
  COMPOUNDCURVE EMPTY: Failed to instantiate
  CURVEPOLYGON EMPTY: Failed to instantiate
  MULTICURVE EMPTY: Failed to instantiate
  MULTISURFACE EMPTY: Failed to instantiate
  CURVE EMPTY: Failed to instantiate
  SURFACE EMPTY: Failed to instantiate
  POLYHEDRALSURFACE EMPTY: Failed to instantiate
  TIN EMPTY: Failed to instantiate
  TRIANGLE EMPTY: Failed to instantiate

I've written a test to validate the supported datatypes and another to validate the expected behaviour of the POLYGON EMPTY work-around. I don't know if this list of supported empty WKT types should be recorded anywhere beyond the test suite.

Finally, while I was in organization.importers.validators.validate_row() I noticed that the geometry data could either be returned as a GEOSGeometry instance or a WKT string. While Python allows this, it feels a bit wrong as it detracts from the predictability of the function. For this reason, I changed the code to return the geometry data as either None or a GEOSGeometry instance. Also, while in the function I attempted to remove repeated code by adding the get_field_value() function.

When should this PR be merged

Any time is good.

Risks

The only risk I can see is if storing empty geometries is not desired behaviour (i.e. we should actually be throwing a ValidationError), however #1113 seems to suggest that it is desired behaviour.

Follow-up actions

N/A

Checklist (for reviewing)

General

  • Is this PR explained thoroughly? All code changes must be accounted for in the PR description.
  • Is the PR labeled correctly? It should have the migration label if a new migration is added.
  • Is the risk level assessment sufficient? The risks section should contain all risks that might be introduced with the PR and which actions we need to take to mitigate these risks. Possible risks are database migrations, new libraries that need to be installed or changes to deployment scripts.

Functionality

  • Are all requirements met? Compare implemented functionality with the requirements specification.
  • Does the UI work as expected? There should be no Javascript errors in the console; all resources should load. There should be no unexpected errors. Deliberately try to break the feature to find out if there are corner cases that are not handled.

Code

  • Do you fully understand the introduced changes to the code? If not ask for clarification, it might uncover ways to solve a problem in a more elegant and efficient way.
  • Does the PR introduce any inefficient database requests? Use the debug server to check for duplicate requests.
  • Are all necessary strings marked for translation? All strings that are exposed to users via the UI must be marked for translation.

Tests

  • Are there sufficient test cases? Ensure that all components are tested individually; models, forms, and serializers should be tested in isolation even if a test for a view covers these components.
  • If this is a bug fix, are tests for the issue in place? There must be a test case for the bug to ensure the issue won’t regress. Make sure that the tests break without the new code to fix the issue.
  • If this is a new feature or a significant change to an existing feature? has the manual testing spreadsheet been updated with instructions for manual testing?

Security

  • Confirm this PR doesn't commit any keys, passwords, tokens, usernames, or other secrets.
  • Are all UI and API inputs run through forms or serializers?
  • Are all external inputs validated and sanitized appropriately?
  • Does all branching logic have a default case?
  • Does this solution handle outliers and edge cases gracefully?
  • Are all external communications secured and restricted to SSL?

Documentation

  • Are changes to the UI documented in the platform docs? If this PR introduces new platform site functionality or changes existing ones, the changes must be documented in the Cadasta Platform Documentation.
  • Are changes to the API documented in the API docs? If this PR introduces new API functionality or changes existing ones, the changes must be documented in the API docs.
  • Are reusable components documented? If this PR introduces components that are relevant to other developers (for instance a mixin for a view or a generic form) they should be documented in the Wiki.

@alukach
Copy link
Contributor Author

alukach commented Mar 25, 2017

Based on @seav's comments in #1254 pointing out that there is currently code expecting that all SpatialUnit instances have a non-null geometry attribute, I've closed this PR. Will reopen once bugs relating to SpatialUnit.geometry == None are resolved.

@alukach alukach reopened this Mar 27, 2017
@alukach
Copy link
Contributor Author

alukach commented Mar 27, 2017

I've modified the organization.download.base.Exporter.get_values() method to assign None to any attribute (or sub attribute, such as geometry.wkt) that is not present on a model. I'm not totally confident in this idea as I think it could allow for errors to go unnoticed. However, I'm not really sure what would be better.

We could only throw errors when a base attribute is not present on a model, but return None when a sub-attribute is missing; however I don't really know if this is much better. And of course, we could only handle situations where the attr == 'geometry', however this feels like it would be a bit of a kludge.

An aside, rather than returning None for missing attributes we could alternatively simply ignore attributes if they're not present rather (not including them in the output of Exporter.get_values()). Also not sure if this is an improvement or not.

Note: Waiting to hear resolution on this before I rebase this into a single commit.

Copy link
Member

@oliverroick oliverroick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one small issue; see the comment below.

Regarding your question, @bjohare might be more qualified to answer than I am.

except InvalidODKGeometryError:
raise ValidationError(_("Invalid geometry."))
geometry = GEOSGeometry(coords)
except:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an explicit exception we can catch here? I usually try to avoid generally catching all exceptions because that might lead to silent failures which are hard to track down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is probably a good call. I was imitating the logic already written but tightening up our exception handling seems pretty reasonable.

Looking at the __init__ method of GEOSGeometry, a ValueError will be thrown if the input is a string but isn't recognized as either WKT, EWKT, or HEXEWKB. This seems like a sensible error to catch. A TypeError and GEOSException can also be thrown but neither of those should apply in this situation and it would likely be best to allow those to just raise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

@alukach alukach force-pushed the bugfix/#1113 branch 2 times, most recently from 7f0ed4e to 4a12c41 Compare March 27, 2017 17:32
@bjohare
Copy link
Contributor

bjohare commented Mar 28, 2017

@alukach, @oliverroick I don't see any elegant solution to this regardless of the approach we take. For the exports I think an explicit check for a missing geometry is preferable. All the other model attributes are required and are therefore guaranteed to be present during exporting. I'm leaning (slightly) towards providing a default value for geometries to avoid this issue in the export process. The fact that we can't store POLYGON EMPTY is pretty annoying, but I guess converting these to POINT EMPTY is an even worse kludge!?

@alukach
Copy link
Contributor Author

alukach commented Mar 28, 2017

@bjohare Update PR to only permit fields name geometry to be empty on exports. As you say, it's definitely not an elegant solution, but at least it will permit data that is valid in our DB to be exported without error

I'm leaning (slightly) towards providing a default value for geometries to avoid this issue in the export process.

I'm still a bit adverse to this, I think we've gotten to a point where we could do this but don't necessarily need to do this. If we were to store data as POINT EMPTY, I'd recommend removing the null=True option from the field and doing the migration as mentioned before. Casting null data to POINT EMPTY when we don't really have to do so feels like an making-an-assumption/taking-a-liberty with the data that we aren't required to do.

Copy link
Contributor

@bjohare bjohare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Can confirm imports, exports and re-imports all work as expected when geometries are missing.

@amplifi amplifi merged commit b49d9ea into master Mar 30, 2017
@amplifi amplifi deleted the bugfix/#1113 branch March 30, 2017 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Importing these Files Exceptions Is Not Caught
4 participants