Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create GeoJSON I/O project using new Sytem.Text.Json #44

Closed
FObermaier opened this issue Oct 22, 2019 · 20 comments
Closed

Create GeoJSON I/O project using new Sytem.Text.Json #44

FObermaier opened this issue Oct 22, 2019 · 20 comments
Assignees
Labels
enhancement STJ Applies to NetTopologySuite.IO.GeoJSON4STJ

Comments

@FObermaier
Copy link
Member

Resolving #42 it seemed worthwhile to create a NetTopologySuite.IO project that does not depend on Newtonsoft.Json but on the new System.Text.Json package to improve performance and avoid/minimize unnecessary package references.

@airbreather
Copy link
Member

I've been thinking about starting this for a while (pretty much ever since it was announced), I just never seem to have the time :-(

@FObermaier FObermaier self-assigned this Oct 23, 2019
FObermaier added a commit that referenced this issue Oct 23, 2019
Geometry converter is ready for review.

Relates to #44
@HarelM
Copy link
Contributor

HarelM commented Oct 24, 2019

@FObermaier you're fast...
I had a long ride today and had a different thought after digging into this whole JSON serialization stuff in .Net.
I have noticed there are faster JSON serializers than the new shiny STJ - mainly utf8json and jil.
I found that elasticsearch have moved to use utf8json for performance and thought I might use it too.
But instead of writing custom code to each one of them I think a more elegant solution would be to be able to just create a POCO from feature/featurecollection.
Some thing like

var feature = new Feature(...);
feature.ToPoco();

Creating a POCO would allow any JSON serializer to simply serialize the POCO reducing the complexity of writing serializers, as most (or all) support serializing a simple object.
Luckly, GeoJSON can be simplified to a POCO, I think.

I'm not sure if this should be a requirement to NTS or even JTS, but I think it can solve a lot of issues related to serialization.
The idea is already somewhat implemented in turfjs where data and operations are kept apart, but this is a whole different aspect and effort...

@airbreather
Copy link
Member

@FObermaier I noticed you've assigned this to yourself, have you gotten started on it yet? If not, I'd like to try my hand at doing this as sort-of a fresh start.

@FObermaier
Copy link
Member Author

FObermaier commented Feb 4, 2020

@airbreather
Copy link
Member

airbreather commented Feb 8, 2020

This is how far I've gotten:
wip (compare)

Maybe I can find a simpler way of handling coordinates... it's going to be at least a little complicated because JSON does not guarantee property order, but maybe I can at least make it a little better.

If nothing else, perhaps less allocation-heavy (System.Text.Json is attractive because of its relatively low memory allocation requirements).

@FObermaier
Copy link
Member Author

Perhaps you like 27485e0 better.
Unfortunatly Utf8JsonReader hides the underlying OriginalSequence or OriginalSpan, so we cannot copy directly, only with reflection.

@airbreather
Copy link
Member

I think what I'd ideally like to do is have it return object, and then depending on the "type" that we read, we assert that the object we read is one of a few types:

Tokens at the beginning of "coordinates" What we create for it What "type" values it works for
StartArray, Number Point "Point"
StartArray, StartArray, Number CoordinateSequence "LineString", "MultiPoint"
StartArray, StartArray, StartArray, Number CoordinateSequence[] "Polygon", "MultiLineString"
StartArray, StartArray, StartArray, StartArray, Number MultiPolygon "MultiPolygon"

The idea being that if we see just [x,y], then we know it's only allowed to be a Point, and likewise if we see all the way down to [[[[x,y], ..., then we know it's only allowed to be a MultiPolygon, so in those two cases, we can build the final Geometry right away.

Otherwise, we can still build CoordinateSequence objects that can be used to create the final Geometry once we know what "type" it is, so we might as well go that far... the only type where that's weird is "MultiPoint", because GeometryFactory.CreateMultiPoint(CoordinateSequence coordinates) has to deconstruct the CoordinateSequence, but it shouldn't be the end of the world.

@airbreather
Copy link
Member

So something like:

object? coordData = null;
string? type = null;

/* ... */
switch (propertyName)
{
	case "coordinates":
		coordData = ReadCoordData(ref reader, /* and others */);
		break;

	case "type":
		type = reader.GetString();
		break;

	/* ... */
}

/* ... */

return type switch
{
	"Point" when coordData is Point pt => pt,
	"LineString" when coordData is CoordinateSequence seq => _geometryFactory.CreateLineString(seq),
	/* ... */
	"MultiPolygon" when coordData is MultiPolygon mp => mp,
	_ => throw new JsonException("type / coordinates mismatch"),
};

@airbreather
Copy link
Member

Some feedback I have from going through that code:

  1. IMO the only public thing we expose should be an instance of JsonConverterFactory that can create whichever converters are needed to be able to deserialize all GeoJSON types. That lets the user only have to add one thing to their JsonSerializerOptions.Converters list (instead of however many), and it gives us the flexibility to shuffle around implementations.
  2. Instead of using our converter objects directly (e.g., for nested objects), we should probably favor using JsonSerializer.Deserialize / JsonSerializer.Serialize methods, same as the user is probably calling us with.
  3. IMO we can skip trying to do anything with comments. They're not valid JSON, the default behavior is to throw before we even see any comments, and even if there are comments in the file, the caller can set JsonCommentHandling.Skip to make them invisible to us anyway.
  4. IMO everything we do in StjEnvelopeConverter and StjAttributeTableConverter should be owned by the converters that use it, instead of having them stand alone; Envelope isn't a defined GeoJSON type, and a Feature's "properties" tag is actually defined as just any JSON object, so it might add confusion to let these stand on their own.
  5. StjAttributeTableConverter smells awful... I don't have an concrete alternative right now, but I feel like this could merit its own JSON-specific implementation of IAttributesTable backed by a JsonDocument. I haven't completely thought this through, and I do see that it would have some of the same smells that this has, so this is really half-baked right now.
    • For reference, it looks like [System.Text.Json] serialize/deserialize any object dotnet/runtime#30969 is why this has to be so complicated: we need to produce something that resembles "any arbitrary JSON object", which itself could contain nested arbitrary JSON objects as property values, yet the framework doesn't give us any facilities that we could use to automatically detect what CLR type to use based on the payload.
  6. The special-case of the "id" property really bothers me... it adds way more complexity (hidden and otherwise) than I think it's worth; let's add something in NetTopologySuite.Features to at least allow the concrete IFeature implementation to have its own "id" that's separate from Attributes?
    • e.g.:
    public interface IFeatureWithId : IFeature
    {
        object Id { get; set; }
    }
    
    public static class FeatureExtensions
    {
        public static object GetOptionalId(this IFeature feature)
        {
            return feature is IFeatureWithId featureWithId
                ? featureWithId.Id
                : feature?.Attributes?.GetOptionalValue("id");
        }
    }

@FObermaier
Copy link
Member Author

@airbreather, I have no objections. Instead of IFeatureWithId I'd prefer IUniqueFeature. perhaps split up in IUnique, IFeature and IUniqueFeature : IFeature, IUnique

@OskarKlintrot
Copy link

Is this still being worked on? I would love to be able to replace Newtonsoft with System.Text.Json!

@simon25608
Copy link

I am also waiting for the change!

Great job!!

@airbreather airbreather self-assigned this Jun 17, 2020
@airbreather
Copy link
Member

This has been released and published to NuGet: https://www.nuget.org/packages/NetTopologySuite.IO.GeoJSON4STJ

Release tag: https://github.com/NetTopologySuite/NetTopologySuite.IO.GeoJSON/releases/tag/stj-v2.0.3

Instructions on how to use it: https://github.com/NetTopologySuite/NetTopologySuite.IO.GeoJSON/wiki/Using-NTS.IO.GeoJSON4STJ-with-ASP.NET-Core-MVC

Thanks to @FObermaier for getting the ball rolling on this one!

@HarelM
Copy link
Contributor

HarelM commented Jun 18, 2020

This is super!!
Wanted to take this baby for a drive but found out the following issue:
I might be missing something very simple but the following throws an exception:

        public class Hit
        {
            public Feature source { get; set; }
        }

        [TestMethod]
        public void GetFeatureInsideAnObject()
        {
            var options = new JsonSerializerOptions();
            options.Converters.Add(new GeoJsonConverterFactory());
            JsonSerializer.Deserialize<Hit>(File.ReadAllText(@"D:\hit.geojson"), options);
        }

Content of hit.geojson:

{
	"source" : {
	  "type" : "Feature",
	  "geometry" : {
		"type" : "Point",
		"coordinates" : [
		  35.040589,
		  31.054653
		]
	  },
	  "properties" : {
		"poiId" : "OSM_node_1803047481"
	  }
	}
}

Exception I get when running this test:
Unable to cast object of type 'NetTopologySuite.IO.Converters.StjFeature' to type 'NetTopologySuite.Features.Feature'.'

Am I missing something obvious?

@airbreather
Copy link
Member

Exception I get when running this test:
Unable to cast object of type 'NetTopologySuite.IO.Converters.StjFeature' to type 'NetTopologySuite.Features.Feature'.'

It's got to be IFeature, not Feature.

@HarelM
Copy link
Contributor

HarelM commented Jun 18, 2020

Yea, I just figured this out. :-/
I must say, and I don't mean to be rude, that this doesn't make sense to me...
The migration from 1.7 to 2.0 removed most of the interfaces (or at least it did for geometries?).
I understood this as a road map to remove those interfaces and now most of my code doesn't use interfaces, not for geometries and not for features, attribute table, feature collection etc.
I don't mind replacing it back, but I don't think it makes much sense in terms of programming.
Does this mean that If I serialize an object of type Feature and deserialize it back I'll get a different type of object?
Does this mean I can't serialize Feature but only StjFeature - so I need change the entire creation code now?
I'm confused, honestly.
Don't get me wrong, I think you are doing a great job, I'm just not following here I guess...

@FObermaier
Copy link
Member Author

The argument over [IFeature]/[IAttributesTable] can be found here:
NetTopologySuite/NetTopologySuite.Features#8

@HarelM
Copy link
Contributor

HarelM commented Jun 19, 2020

I've read the thread twice, I still don't understand how to structure my code.
BTW here are some arguments against current implementation of IAttributeTable, I wish I was able to give my input then and not now after 2.0 is out.

  1. I keep hitting the "Key does not exists" exception when using the attribute table with indexer, the fact that it is almost like a dictionary but not the same make it very hard/annoying. This is the opposite of the pit of success in my opinion.
  2. GetNames as a method and not a property like Keys in dictionary is again confusing.
    Bottom line, AttributeTable is kinda dictionary but doesn't implement the interface keep causing me bugs.

But to our topic - I don't mind changing all my code that doesn't construct a Feature to use IFeature this is relatively easy. I don't understand which type should I use when I construct a feature, should I use Feature or StjFeature? How should I decide? Why am I forced to use another implementation of IFeature When all I'm doing is desrialization a serialization? This doesn't make sense to me...

I'd love to understand in order to build my code properly, changing a serialization layer (From Json.Net to STJ) should've made me change creation code in my personal opinion...

@airbreather
Copy link
Member

airbreather commented Jun 19, 2020

There's a lot to unpack and respond to here, so I'm going to divide this into sections. Sorry about the long comment, but there's a lot to go through here, and if everything were simple, then we wouldn't have these significant API clarity issues to begin with.


I understood this as a road map to remove those interfaces and now most of my code doesn't use interfaces, not for geometries and not for features, attribute table, feature collection etc.

IGeometry and friends were deadweight: there were so many members that it's extremely difficult to implement, and the implementations of those interfaces were already designed to be abstractions anyway. It wasn't adding value.

ICoordinateSequence corresponded to an interface to which JTS started adding default implementations. Because NTS cannot do the same without dropping support for .NET Framework consumers, it made sense to revamp that one too.

During this exercise, certain other interfaces were also called out as not really doing anything for anyone, so they were removed too.

I initially thought that IFeature / IAttributesTable were in the same boat, so I proposed removing them as well. However, @FObermaier pointed out some actual use cases for these interfaces in that thread, so they stayed.


Does this mean that If I serialize an object of type Feature and deserialize it back I'll get a different type of object?

NetTopologySuite.IO.GeoJSON4STJ lets you do two things:

  1. Serialize compatible objects to the GeoJSON format using the System.Text.Json library
  2. Deserialize compatible objects from the GeoJSON format using the System.Text.Json library

Each of the two activities has its own separate set of considerations.

Serializing things out is relatively easy and straightforward, since we have all the information available on-demand, in any order we want it, and System.Text.Json is happy to serialize nested objects of any arbitrary type.

Deserializing is a bigger challenge, because we have to support (or at least somewhat consider) valid GeoJSON inputs from any application, with properties in any order, using any combination of features. The two biggest challenges for this (for me, anyway) were:

  1. A GeoJSON "Feature" can have an "id" property value.
    • Not all IFeature implementations support a special Id.
    • NTS.IO.GeoJSON does this by sticking an "id" attribute onto the IAttributesTable that corresponds to the GeoJSON feature's "properties", clobbering whatever else is there.
    • NTS.IO.GeoJSON4STJ does this by using a separate implementation of IFeature that has its own CLR property for "Id".
    • An extension method lets you use "someFeature.GetOptionalId("id")", which will work for either of the two.
  2. A GeoJSON "Feature" has a "properties" property value that's defined as an arbitrary (unstructured) JSON object, which may have nested JSON objects.
    • We have IAttributesTable for this, with a pretty flexible standard implementation.
    • However, System.Text.Json doesn't let us deserialize nested instances of arbitrary (statically unknown) types like Newtonsoft.Json does, for security reasons.
    • So NTS.IO.GeoJSON4STJ also has its own implementation of IAttributesTable: you can use the standard APIs to get a nested IAttributesTable instance for sub-objects, but you can also use a special extension method if you have a better idea of what type to use.

Does this mean I can't serialize Feature but only StjFeature - so I need change the entire creation code now?

I don't understand which type should I use when I construct a feature, should I use Feature or StjFeature? How should I decide? Why am I forced to use another implementation of IFeature When all I'm doing is desrialization a serialization?

StjFeature is internal, so part of that decision has been made for you.

  • Deserialize into a property of type IFeature.
  • Serialize any implementation of IFeature you want, including Feature.
    • If you want to fill the GeoJSON "Feature"'s "id", then you can use either:
      1. The GeoJsonConverterFactory.DefaultIdPropertyName attribute (or whatever other name you put into the GeoJsonConverterFactory constructor), or
      2. The IUnique.Id value, if your IFeature implementation also implements IUnique.

I hesitate to add support for deserializing into instances of Feature, at least in its current form, because of the special "id" value on GeoJSON features. I'm sure there's a reasonable way to make it work if this is a significant blocker, but IFeature does the job for now, so I'm not inclined to try to figure that out in the near future.


  1. I keep hitting the "Key does not exists" exception when using the attribute table with indexer, the fact that it is almost like a dictionary but not the same make it very hard/annoying. This is the opposite of the pit of success in my opinion.
  2. GetNames as a method and not a property like Keys in dictionary is again confusing.
    Bottom line, AttributeTable is kinda dictionary but doesn't implement the interface keep causing me bugs.

I wholeheartedly agree and sympathize with everything you say here. I raised up basically these same issues in the thread that @FObermaier linked. I felt that it's silly to implement this nonstandard IAttributesTable type that looks almost exactly like a dictionary, without actually exposing the standard dictionary API that every .NET developer is expected to grow familiar with eventually.

From the standpoint of GeoJSON, it seemed better to just expose a Dictionary<string, object>. I proposed doing exactly that, as part of the other breaking changes that we were making in 2.0, since it would have been very easy and resolved exactly this kind of confusing situation.

However, there were examples of IAttributesTable used in contexts other than GeoJSON, where there is a statically known set of "attributes" that can be mapped to CLR properties of the object. If we were going to redesign this, then we have a responsibility to either provide alternatives for those use cases, or accept that we would leave them behind.


Not to derail too much, but in my personal opinion, I think NetTopologySuite.Features itself was a mistake (not just splitting it out, but the types themselves).

  • IMO, each I/O library should have been responsible for their own modeling from the beginning. All non-I/O-library use cases could be supported by translating between those formats and applications' domain models, and/or implemented more directly in the domain models themselves.
  • This would have given GeoJSON the flexibility that it would have needed without either weird extension methods or forcing this apparent flexibility onto stricter libraries like GPX.
    • You've seen, of course, how I worked around that with GPX: its IFeature support is there, but completely separate from the rest of the object model, almost as an afterthought.
  • But that's not where we're at right now: instead, ecosystems are already built around it, and things like ShapefileDataWriter.Write(IEnumerable<IFeature>) encourage other I/O libraries to follow suit so that they can interoperate with one another.
  • So I'm content with maintaining the status quo for the time being until we can design, implement, and deploy something that takes all the known use cases into consideration (even if that "consideration" is just recognizing them as they get left behind for the sake of everybody else, if it does eventually come to that).

@HarelM
Copy link
Contributor

HarelM commented Jun 19, 2020

Thanks for your detailed answer! I enjoyed reading it.
I'll try to be more practical in what I think can help improve the current situation.

  1. Change the default attribute table indexer myAttributeTable["somename"] = "someValue" to add or update instead of update or throw - I know this is a breaking change but I don't think it's a major one. I wrote an extension method that does exactly that because of this behavior and I'd be happy to throw away that code and avoid the need to "remember to use it" every time I add or update a property.
  2. I might have missed it so please point me to the right place in the code - Add a factory to create IFeature, IFeatureColleciton and IAttributeTable. and make sure this factory has an interface for DI.
    Explanation: If I'm not suppose to know the underlying implementation than I shouldn't call new hence I need factories. Moreover, If STJ is implementing a new factory that will return Stj Specific implementation due to the above requirements there should also be a factory interface so that I can inject it to every class that needs it to create these type of objects. This will allow to configure the factory in one place (hopefully) and allow me to truly not know what is the underlying implementation. This is not a breaking change.
    This will allow me to easily switch from the current implementation to STJ implementation with minimal changes in my code, assuming I wrote it properly, i.e. with factories. This will also allow me to refactor the code using the current implementation and only when the code is working and tested to switch to STJ, I can't do it right now as far as I understand.
    I did the same (almost fully) with geometry factory - I never call a new for geometries and I inject the geometry factory whenever I need to create a geometry. If there was a change in the underlying implementation I could relatively easily switch it.

That's my 2 cents. I'd be happy to land a hand progressing this forward if need be. More than happy to test it when it's available on NuGet like I tried with STJ implementation and migration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement STJ Applies to NetTopologySuite.IO.GeoJSON4STJ
Projects
None yet
Development

No branches or pull requests

5 participants