Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finish Point Cloud Tile spec #22

Closed
mpgerlek opened this issue Sep 25, 2015 · 29 comments
Closed

Finish Point Cloud Tile spec #22

mpgerlek opened this issue Sep 25, 2015 · 29 comments

Comments

@mpgerlek
Copy link

The following changes to the points README would help reduce potential for misunderstanding:

  • the magic bytes should be set to pnts
  • the version should be set to 1
  • are the positions data to be in lat/lon, radians, or..?
  • are the positions data to be expressed as XYZXYZXYZ or XXXYYYZZZ?
  • are the colors data to be expressed as RGBRGBRGB or RRRGGGBBB?
  • as a sanity check, note that the file size should be equal to 16 + numpoints * 15
@hobu
Copy link

hobu commented Apr 14, 2016

Hobu, Inc. has been working on organization and streaming of large scale point clouds, and we have been working with the Potree project. See http://speck.ly and http://potree.entwine.io for examples. We have developed two open source software projects in support of this effort -- Greyhound http://github.com/hobu/greyhound and Entwine http://github.com/connormanning/entwine Greyhound is the HTTP server that answers LoD requests for a dynamic point cloud schema. Entwine is software that organizes massive point clouds (100+ billion pts) to be served through Greyhound.

Entwine is like PotreeConverter if you are familiar with that software, except more sophisticated in that it losslessly processes the data and does so in a parallel way. PotreeConverter answers a fixed schema, whereas Entwine/Greyhound is dynamic (a basic assumption of XYZ of course). Greyhound answers requests as either uncompressed ArrayBuffer or LASzip-style compressed fields, which works in both native C/C++ and JavaScript code. We have found this LASzip-style approach to be efficient and data preserving from source all the way to client.

3D Tiles doesn't have to use the approaches we've developed, but we have been working through the problem for quite a while, and it may learn from our experience. Here's some highlights:

  1. You only have to look at PCL's header files to see the consequences of fixed-field dimension types for point cloud data. There just ends up being way too many. Everyone wants their own special fields, and accommodating them all in a fixed way ends up spreading lots of pain around. We developed Greyhound to be able to answer dynamically (ask for the dimension list, ask for what you want in the format you want) to make it easier for clients. In my opinion, a static tile point cloud format must have the ability to dynamically define the content beyond the required XYZ. It doesn't have to be fancily specified, but I think it needs to be there.
  2. Lossy data structures are a non-starter for many. There is a strong desire for a data organization that serves both the need of a servable LoD structure and one that does archival, at-rest storage.
  3. Point cloud data, especially LiDAR-based data, often have wildly varying densities, sometimes within the same dataset. Some data sets are so dense you will quickly see quantization issues in WebGL with floats.

Let us know how we can help, and I look forward to seeing you at FOSS4GNA.

@pjcozzi
Copy link
Contributor

pjcozzi commented Apr 18, 2016

@hobu cool stuff, thanks for the comments.

  1. Agreed, 3D Tiles uses a batch table for this for the model tile formats. I suspect it - or something very similar - will work for point cloud tiles.
  2. Agreed, we would not use a lossy tile format, at the leaf tiles (for replacement refinement) or any tile (for additive refinement).
  3. Agreed, Cesium and 3D Tiles are built for this from the ground up.

Also, we are very interested in point cloud compression that can be decompressed quickly. How fast is LASzip decompression in JavaScript? We talked about it quite a while ago and I thought it had pretty significant overhead.

@pjcozzi
Copy link
Contributor

pjcozzi commented Jul 10, 2016

Notes for updates to the .pnts tile format

colors are first remapped to YCoCg-R color space [MS03] to reduce correlation and then mapped to 5 bits for the luminance and 6 bits for each chroma components.

Also see The Compact YCoCg Frame Buffer. Given that the base tile format cannot require lossy colors, we could also move color to the batch table and have a raw and compressed version; this would also be optimal for point clouds without color.

  • The Cesium API needs to be able to generate vertex shaders that map values in the batch table to visual properties, e.g., show points with intensity > 0.5 or map intensity to this color ramp using 3D Tiles Declarative Styling.
    • In addition to show/color/alpha, add the ability to map Styling expressions to point size (gl_PointSize)
    • This should be a general backend for 3D Tiles Styling that generates GLSL snippets for a given style. Please do not overbuilt this; we may need to add limitations like not allowing strings to keep the scope reasonable. The 3D Tiles Styling implementation that uses JavaScript, not GLSL, is in Cesium3DTileStyle.js.
    • Also see Add option to Cesium3DTileset to set custom appearance cesium#3639
  • The Cesium API should also support point size attenuation and vertex shader culling
  • Cesium API: edge-preserving blur (Game Engine Gems 3)
  • TODO: initial set of batch table semantics, e.g., NORMAL (e.g., for hidden surface removal), various COLORs. Not sure if anything else needs to be well-known from the engines perspective since 3D Tiles Styling can generate shading on the fly based on the user's styles.

@pjcozzi
Copy link
Contributor

pjcozzi commented Jul 11, 2016

Discussed with @lilleyse offline:

  • To efficiently store colors in the batch table, we'll need Batch Table changes #32. @bagnell this is likely the same thing you'll need for vector data
  • Hold on picking for now; later, we might need per-facade id, e.g., from the batch id, to identify larger features in point clouds
  • The Cesium API should not expose each point as a feature; the overhead is too high
  • In addition to hidden surface removal, the NORMAL semantic will, of course, be useful for lighting. We can expose this option in the Cesium API

@pjcozzi
Copy link
Contributor

pjcozzi commented Jul 11, 2016

Discussed offline with @lilleyse @bagnell

  • For Batch Table changes #32, looking into copying glTF's accessor, and adding doubles and evaluating if byteStride/min/max are needed. Today's [batchTable] will likely be be [batchTableJSON][batchTableBinary]. Vector tiles (Vector Tile Roadmap #25) will likely have a similar setup for features.

@pjcozzi
Copy link
Contributor

pjcozzi commented Jul 12, 2016

For #32, looking into copying glTF's accessor...

Discussion moved to #32

@pjcozzi
Copy link
Contributor

pjcozzi commented Jul 13, 2016

Spec update:

  • Include default color when per-point color is not provided, TILES3D_COLOR

@pjcozzi
Copy link
Contributor

pjcozzi commented Jul 13, 2016

Cesium implementation of color batch table semantics by @lilleyse - CesiumGS/cesium#4112

Still a lot more work to go.

@pjcozzi
Copy link
Contributor

pjcozzi commented Jul 13, 2016

Spec update:

  • If both TILES3D_RGB and TILES3D_RGBA semantics are present, use RGBA or disable this?

@pjcozzi pjcozzi changed the title Improvements to the points README Finish Point Cloud Tile spec Jul 14, 2016
@lilleyse
Copy link
Contributor

If both TILES3D_RGB and TILES3D_RGBA semantics are present, use RGBA or disable this?

I'm leaning towards using RGBA.

@pjcozzi
Copy link
Contributor

pjcozzi commented Jul 14, 2016

I'm leaning towards using RGBA.

OK.

@pjcozzi
Copy link
Contributor

pjcozzi commented Jul 14, 2016

Cesium implementation of color batch table semantics by @lilleyse - CesiumGS/cesium#4112/

Still a lot more work to go.

Spec notes in #104.

@pjcozzi
Copy link
Contributor

pjcozzi commented Aug 8, 2016

@nosy-b just wanted to loop you into the plans for the point cloud tile format in 3D Tiles to see if you have any feedback. See #22 (comment) and #22 (comment)

@pjcozzi
Copy link
Contributor

pjcozzi commented Aug 9, 2016

First round of Cesium changes by @lilleyse: CesiumGS/cesium#4183

@pjcozzi pjcozzi mentioned this issue Aug 19, 2016
@lilleyse
Copy link
Contributor

lilleyse commented Aug 22, 2016

CC #118 @pjcozzi @lasalvavida

I wonder if we may need to support componentType in addition to byteOffset for certain feature table properties like BATCH_ID. For example when distinguishing different features in a point cloud, it's pretty unlikely that there are more than 256, so UNSIGNED_BYTE is usually okay. But in some cases each point might be a feature, and we would easily see more than 256 batch ids, so a componentType of UNSIGNED_SHORT is required.

Then there's an alternative idea that instead of storing a batchId per point, the feature table would contain a global property called BATCHES that is an array of offsets which define the contiguous regions of points that make up each batch. Then the points wouldn't have to store batchIds and the tile size would be much smaller.

@lasalvavida
Copy link
Contributor

lasalvavida commented Aug 22, 2016

BATCH_ID is actually defined as UNSIGNED_SHORT already, giving you 65535 batch ids. I don't think that componentType is necessary because the data type for a particular semantic is always defined and known.

In the event that you are exceeding the data type limitations for a single tile, the data can be split out onto multiple tiles as part of a composite tile.

@pjcozzi
Copy link
Contributor

pjcozzi commented Aug 22, 2016

In glTF, the semantic defines the data type: https://github.com/KhronosGroup/glTF/blob/master/specification/README.md#semantics

However, ...

In the event that you are exceeding the data type limitations for a single tile, the data can be split out onto multiple tiles as part of a composite tile.

The downside here is it hurts client-side batching and increases the number of draw calls.

We're going to need componentType for the batch table, right? It would be reasonable for the spec to explicitly define the componentType or allowed component types, e.g., for the case of batch ids: unsigned byte, short, and int.

If the implementation for this is simple, I think it is a fine change.

Then there's an alternative idea that instead of storing a batchId per point, the feature table would contain a global property called BATCHES that is an array of offsets which define the contiguous regions of points that make up each batch. Then the points wouldn't have to store batchIds and the tile size would be much smaller.

Good thought for perhaps post 1.0 depending on community interest.

@lilleyse
Copy link
Contributor

batchId is actually defined as UNSIGNED_SHORT already, giving you 65535 batch ids. I don't think that componentType is necessary because the data type for a particular semantic is always defined and known.

I think it will need to more flexible though to accommodate UNSIGNED_BYTE because storing a short per point is very pretty costly when a byte is good enough.

@pjcozzi just to be clear, do you prefer the second method?

"BATCH_ID_8" : {
    "byteOffset" : 0
}
"BATCH_ID" : {
    "byteOffset" : 0
    "componentType" : "UNSIGNED_BYTE" // or "UNSIGNED_SHORT", "UNSIGNED_INT" (defined in spec)
}

@lasalvavida
Copy link
Contributor

Maybe define a default data type and allow it to be overwritten?

@pjcozzi
Copy link
Contributor

pjcozzi commented Aug 22, 2016

Use the second method. The semantic would define the valid values for componentType and a default as @lasalvavida suggested.

For example:

"BATCH_ID" : {
    "byteOffset" : 0,
    "componentType" : "UNSIGNED_BYTE"
}

and

"BATCH_ID" : {
    "byteOffset" : 0
    // componentType defaults to UNSIGNED_SHORT
}

@lasalvavida
Copy link
Contributor

lasalvavida commented Aug 22, 2016

@pjcozzi Is there a particular reason we are tied to v3 of the json-schema spec? I haven't found a way to define that POSITION is required unless POSITION_QUANTIZED is present and vice-versa in v3 of the spec, but it's pretty easy to do in v4 with the oneOf or not keywords. https://github.com/json-schema/json-schema/wiki/anyOf,-allOf,-oneOf,-not

Disregard, wrong thread.

@lilleyse
Copy link
Contributor

Ignoring dynamic styling of point sizes, how do these properties sound for the feature table:

  • CONSTANT_POINT_SIZE - global point size in pixels for all points, default is application-specific
  • POINT_SIZE - per-point point size in pixels

@pjcozzi
Copy link
Contributor

pjcozzi commented Aug 24, 2016

@lilleyse I don't think we need to include point size in the spec (at least not per-point or even per-tile); it does make sense for vctr, of course.

I alluded to this above:

The Cesium API should also support point size attenuation and vertex shader culling

They may exist, but I have never seen an input point cloud dataset with per point sizes so I would not focus on this now. Point size is usually a runtime concern where, for example, attenuation is based on screen-space error to try to approximate a closed surface.

@lilleyse
Copy link
Contributor

Ok sounds good.

@connormanning
Copy link

connormanning commented Sep 20, 2016

Hey all. I've recently added the ability to output 3D Tiles point cloud tilesets from Entwine. See connormanning/entwine#12. It's very much an initial prototype and does not implement the full specification (although the subset that it does support should be fully conformant), but maybe it can be useful for testing out the point cloud concepts being discussed for the 3D Tiles effort.

Some public resources are available to demonstrate this capability - so far the biggest tileset we've created was the 4.7 billion point set of New York City, which took two hours on an EC2 instance. Some smaller sets are Autzen and Red Rocks Amphitheatre. Comparisons with speck.ly and Potree versions of the same resources might be interesting, since these tilesets are written with the same chunking structure as the one used by Entwine.

The repository skeleton for these samples (minus the actual data) is here, which will mostly be uninteresting to you devs aside from perhaps the docker instructions for using Entwine to generate tilesets. Maybe it can be useful for you.

@pjcozzi
Copy link
Contributor

pjcozzi commented Sep 21, 2016

Awesome work @connormanning! Thanks for sharing all this. I will take a closer look when I am back from travel. In the meantime, feel free to post your spec suggestions here.

@pjcozzi
Copy link
Contributor

pjcozzi commented Jan 11, 2017

@lilleyse is there any more spec work to be done here for a minimum 1.0? If not, OK to close this?

There are a few nice runtime implementation features in #22 (comment) but I can move them to the Cesium roadmap.

@pjcozzi pjcozzi removed the draft 1.0 label Jan 11, 2017
@lilleyse
Copy link
Contributor

I can't think of anything at the moment. Ok to close.

@pjcozzi
Copy link
Contributor

pjcozzi commented Jan 12, 2017

Fantastic!

@pjcozzi pjcozzi closed this as completed Jan 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants