Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Layer Datasets for netCDF CF-1.8, and Other Updates #1740

Merged
merged 113 commits into from
Aug 8, 2019
Merged

Multiple Layer Datasets for netCDF CF-1.8, and Other Updates #1740

merged 113 commits into from
Aug 8, 2019

Conversation

wchen329
Copy link
Contributor

@wchen329 wchen329 commented Jul 24, 2019

What does this PR do?

This PR contains the following updates:

  • Introduces the capability to write CF-1.8 datasets consisting of multiple layers.
  • Enabled "casting" between related (and unrelated) OGR types when writing to- among other reasons- support the -nlt flag and improve compatibility with datasets which may mix different feature types.
  • Refactor the storage of layers to be managed, specifically by shared_ptrs (something like this was originally requested before)
  • Allow measured feature types (with a warning that explains how writing measured types will lose the Measure).
  • Further integrated CF-1.8 implementation of Point writing into the Geometry Scribe (which was and is where other CF-1.8 geometry writing is done) to benefit from buffering, and allow multiple layers in NC3. Also: combined record dimensions with existing dimensions for CF-1.8 datasets.
  • netCDFVirtual and Scribe- write system refactoring (including constant time dimension resize). See more below.
  • Resilience improvements including shutting down reading when a negative value of part_node_count or node_count is encountered (or interior ring is not 0 or 1)
  • Multiple datum writing mode for non String types (could implement String types in the future maybe?)
  • Default buffer size increase to 20%.
  • Fixed issue with writing empty Multipolygons.
  • Documentation updates.

What are related issues/pull requests?

Issue #1287

Writing System Rewrite

Previously, the CF-1.8 writer used a rather simple writing scheme- a state based system that would persist only between layer translation invocations and reset between each write. Unfortunately, the problem with this system is that it would use in-place dim resizing (which involved a lot of extra copying, with perhaps worse than linear complexity and especially slow for very large datasets with multiple layers).

The new CF-1.8 writer introduces something called "netCDFVirtualID." Essentially what this does is present new- but similar- interfaces to dim and var definition in a netCDF file, but does not write the variable to the dataset immediately. Instead, dim and var metadata is held in memory allowing dims to be resized at constant complexity. Until the dim is actually written through the new nc_vsync call.

To accomplish this, a small subset of netCDF API functions were brought to the netCDFVirtual layer; defining a variable or dimension in the netCDF virtual layer gives virtual netCDF IDs which are not guaranteed to be the same as the real netCDF ID. Attribute writing is also available (done through templating), so long as the variable exists ONLY in the virtual layer.

Using the virtual layer is specific to this use case of dimension resizing- in cases where no dimensions exist or are known ahead of time, using the real netCDF APIs directly is generally recommended (as the virtual layer would only slightly slow things down).

GeometryScribe has also been separated into two classes: a "ncScribe" class which manages general transaction handling to the netCDF file and a "Metadata" class which manages the layer's IDs in the netCDFVirtual layer, and the geometry container ID in the real netCDF file.

Tasklist

  • Implement Multiple Layer netCDF-3 and netCDF-4, netCDF-4C Writing
  • Implement unified buffer restriction for Field and Geometry Buffers
  • Enable side-casting between OGR simple geometry types (writing only)
  • CF-1.8 Writing Infrastructure Rewrite
  • Logging
  • Add test case(s)
  • Review
  • Adjust for comments
  • All CI builds and checks have passed

Is this Ready for Review?

👍 hopefully

Environment

Provide environment details, if relevant:

  • OS: openSUSE Tumbleweed and Windows 7 Professional AMD64
  • Compiler: GCC 9.1.1 (SUSE) and MSVC 14.0 / 2015 (Windows)

wchen329 added 30 commits July 2, 2019 19:43
…if CPLRealloc bug (probably not) or GCC bug (maybe not) or something else).
@wchen329
Copy link
Contributor Author

wchen329 commented Aug 5, 2019

@rouault While I'm chipping away at the CI stuff what else would you like to have changed in the PR?

EDIT: Macintosh build randomly did its thing again i.e. segfaulted in some place irrelevant to PR

gdal/frmts/netcdf/netcdfdataset.cpp Outdated Show resolved Hide resolved
gdal/frmts/netcdf/netcdfdataset.cpp Outdated Show resolved Hide resolved
gdal/frmts/netcdf/netcdfdataset.cpp Outdated Show resolved Hide resolved
gdal/frmts/netcdf/netcdfdataset.cpp Outdated Show resolved Hide resolved
gdal/frmts/netcdf/netcdfdataset.cpp Outdated Show resolved Hide resolved
gdal/frmts/netcdf/netcdfvirtual.h Outdated Show resolved Hide resolved
gdal/frmts/netcdf/netcdfvirtual.h Outdated Show resolved Hide resolved
gdal/frmts/netcdf/netcdfvirtual.h Outdated Show resolved Hide resolved
gdal/frmts/netcdf/netcdfvirtual.h Outdated Show resolved Hide resolved
gdal/frmts/netcdf/netcdfvirtual.h Outdated Show resolved Hide resolved
@wchen329
Copy link
Contributor Author

wchen329 commented Aug 7, 2019

@rouault Thanks for the thorough review. I've went through and made necessary changes for your feedback (except the one that's left unresolved, but I don't think it's not too much to ask to not have the netCDFDataset handle inside the netCDFVID, since a VID can represent a group or a dataset, it's just, well, kind of awkward, and might be error prone).

Now time to wait for CI...

Any other things you would like addressed?

@rouault
Copy link
Member

rouault commented Aug 7, 2019

frmts/netcdf/netcdflayer.cpp:2481,style,unreadVariable,Variable 'status' is assigned a value that is never used.
frmts/netcdf/netcdfvirtual.cpp:67,style,unreadVariable,Variable 'err' is assigned a value that is never used.
frmts/netcdf/netcdfvirtual.cpp:98,style,unreadVariable,Variable 'err' is assigned a value that is never used.
frmts/netcdf/netcdfvirtual.cpp:262,style,unreadVariable,Variable 'err' is assigned a value that is never used.
frmts/netcdf/netcdfvirtual.cpp:277,style,unreadVariable,Variable 'err' is assigned a value that is never used.
frmts/netcdf/netcdfvirtual.cpp:292,style,unreadVariable,Variable 'err' is assigned a value that is never used.
frmts/netcdf/netcdfvirtual.cpp:307,style,unreadVariable,Variable 'err' is assigned a value that is never used.
frmts/netcdf/netcdfvirtual.cpp:322,style,unreadVariable,Variable 'err' is assigned a value that is never used.

@wchen329
Copy link
Contributor Author

wchen329 commented Aug 7, 2019

@rouault Trusty and Macintosh builds failures due to failures unrelated to PR:

Unrelated seg. fault on PROJ test:

osr/osr_ct_proj.py::test_proj[GoogleMercator(#3136)] Segmentation fault (core dumped) (Trusty)

and OS X had a network failure apparently.

BatchFetch request 127.0.0.1:8080/rda_api/tile/foo/bar/0/3.tif failed: Failed to connect to 127.0.0.1 port 8080: Connection reset by peer

[aside: loopback failure? interesting...]

Could we possibly restart these specific ones? Also I don't know why but the Macintosh build keeps seg. faulting at the end for no reason with 4 xfailed, it seems to skip all the netCDF tests, however?

@wchen329
Copy link
Contributor Author

wchen329 commented Aug 7, 2019

Looks like the Macintosh build went dark for no good reason this time :/.

No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself. Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received

Could you restart the build again, thanks. Sorry to have you keep doing this...

@rouault
Copy link
Member

rouault commented Aug 8, 2019

Can you merge master in your branch ? I've commited a workaround to avoid a frequent stall in the Mac build

Fix Issues with OS X Upstream CI
@wchen329
Copy link
Contributor Author

wchen329 commented Aug 8, 2019

Can you merge master in your branch ? I've commited a workaround to avoid a frequent stall in the Mac build

Alright, merge has been done! Here's hoping it all works out-

@rouault rouault merged commit bf81a06 into OSGeo:master Aug 8, 2019
@rouault
Copy link
Member

rouault commented Aug 8, 2019

Squashed and merged

@wchen329
Copy link
Contributor Author

Oh so the Macintosh build finally stopped randomly dying huh (at least stopped dying enough)?

Thanks for your efforts on getting the build to behave @rouault ! 👍 Glad this is finally merged... took a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants