Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self referencing loop detected with type 'NetTopologySuite.Features.AttributesTable'. Path '[257].properties'. #46

Closed
HarelM opened this issue Dec 20, 2019 · 39 comments

Comments

@HarelM
Copy link
Contributor

HarelM commented Dec 20, 2019

I'm not sure what causes this as the stack trace from my server is hot helping but since I upgraded to 2.0 I see a lot of the following error in the log file:
Self referencing loop detected with type 'NetTopologySuite.Features.AttributesTable'. Path '[x].properties'.
Any change you guys know what can cause this?
The number in the [x] is a changing number.
This happens when the server returns a geojson to the client.
The following is the stacktrace:

at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.CheckForCircularReference(JsonWriter writer, Object value, JsonProperty property, JsonContract contract, JsonContainerContract containerContract, JsonProperty containerProperty)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.SerializeConvertable(JsonWriter writer, JsonConverter converter, Object value, JsonContract contract, JsonContainerContract collectionContract, JsonProperty containerProperty)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.Serialize(JsonWriter jsonWriter, Object value, Type objectType)
   at NetTopologySuite.IO.Converters.AttributesTableConverter.WriteJson(JsonWriter writer, Object value, JsonSerializer serializer)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.SerializeConvertable(JsonWriter writer, JsonConverter converter, Object value, JsonContract contract, JsonContainerContract collectionContract, JsonProperty containerProperty)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.Serialize(JsonWriter jsonWriter, Object value, Type objectType)
   at NetTopologySuite.IO.Converters.FeatureConverter.WriteJson(JsonWriter writer, Object value, JsonSerializer serializer)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.SerializeConvertable(JsonWriter writer, JsonConverter converter, Object value, JsonContract contract, JsonContainerContract collectionContract, JsonProperty containerProperty)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.SerializeList(JsonWriter writer, IEnumerable values, JsonArrayContract contract, JsonProperty member, JsonContainerContract collectionContract, JsonProperty containerProperty)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.Serialize(JsonWriter jsonWriter, Object value, Type objectType)
   at Newtonsoft.Json.JsonSerializer.SerializeInternal(JsonWriter jsonWriter, Object value, Type objectType)
   at Microsoft.AspNetCore.Mvc.Formatters.NewtonsoftJsonOutputFormatter.WriteResponseBodyAsync(OutputFormatterWriteContext context, Encoding selectedEncoding)
   at Microsoft.AspNetCore.Mvc.Formatters.NewtonsoftJsonOutputFormatter.WriteResponseBodyAsync(OutputFormatterWriteContext context, Encoding selectedEncoding)
   at Microsoft.AspNetCore.Mvc.Formatters.NewtonsoftJsonOutputFormatter.WriteResponseBodyAsync(OutputFormatterWriteContext context, Encoding selectedEncoding)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeResultFilters>g__Awaited|27_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeFilterPipelineAsync>g__Awaited|19_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeAsync>g__Logged|17_1(ResourceInvoker invoker)
   at Microsoft.AspNetCore.Routing.EndpointMiddleware.<Invoke>g__AwaitRequestTask|6_0(Endpoint endpoint, Task requestTask, ILogger logger)
   at Microsoft.AspNetCore.Authorization.AuthorizationMiddleware.Invoke(HttpContext context)
   at Microsoft.AspNetCore.Authentication.AuthenticationMiddleware.Invoke(HttpContext context)
   at Microsoft.AspNetCore.Server.IIS.Core.IISHttpContextOfT`1.ProcessRequestAsync()

Any help would be appreciated...

@airbreather
Copy link
Member

airbreather commented Jan 18, 2020

Without more details about the object being serialized, I can't say anything definitive.

Have you tried setting options.SerializerSettings.ReferenceLoopHandling to something else around here? The default is apparently ReferenceLoopHandling.Error. Some people have claimed that they got it to do what they wanted by just setting it to ReferenceLoopHandling.Ignore.

@HarelM
Copy link
Contributor Author

HarelM commented Jan 18, 2020

I have tried that, unfortunately it didn't help.
I was able to further debug the issue and see that the deserializer created an object that reference itself.
basically the original object had this hierarchy:

{
    type: "Point",
    properties: {
        ...
        languages: {
            "he": ["some", "items"], 
            "en": ["some-more", "ittems"]
        }
        ...
    },
    geometry: ...
}

The in memory object that some times was deserialized from the above object was the following:

{
    type: "Point",
    properties: {
        ...
        "he": ["some", "items"],
         "en": ["some-more", "ittems"],
        languages: <pointer to properties so that what created the loop>
        ...
    },
    geometry: ...
}

It didn't happen all the time so I'm not sure what caused this... Might be related to multithreading issue of some sort, not sure...
I couldn't create a simple reproduction scenario though, but I was able to reproduce it when I needed to...

@grickettsnz
Copy link

grickettsnz commented Jan 23, 2020

I am having a similar issue on 2.0.1. The AttributesTable seems to get messed up on geoJsonReader.Read and ends up self-referencing along with inserting lower level json properties at the top level of the attributesTable.

Repro here:

[TestMethod] 
public void test_deserialise_nested_geojson()
{
    string sampleGeoJsonNested =
    "{\"type\":\"FeatureCollection\",\"features\":[{\"type\":\"Feature\",\"id\":\"63a72ea5-45d6-4d4c-a77c-948e5c814317\",\"geometry\":{\"type\":\"Polygon\",\"coordinates\":[[[0,0],[1,0],[1,1],[0,0]]]},\"properties\":{\"a\":[],\"b\":[],\"c\":0.15403640270233154,\"d\":0.15403640270233154,\"e\":null,\"f\":null,\"g\":null,\"h\":null,\"i\":0,\"n\":\"2018-05-08T00:00:00\",\"o\":{\"a1\":\"2018-09-14T00:00:00\",\"b1\":86400,\"c1\":\"R6\",\"d1\":4,\"e1\":12.47,\"f1\":1563.25,\"g1\":129,\"h1\":1,\"i1\":0.23666,\"j1\":0.00056,\"k1\":0.8}}}]}";

    var geoJsonReader = new GeoJsonReader();
    var collection = geoJsonReader.Read<FeatureCollection>(sampleGeoJsonNested);

    Assert.AreEqual(11, collection[0].Attributes.Count);
}

image

@HarelM
Copy link
Contributor Author

HarelM commented Jan 23, 2020

Yay! I'm not crazy!
In any case, I truly think converting to POCO and using the default serializers/deserializers can reduce complexity and bugs, and I'm not sure it'll hurt performance significantly.
See my comment here:
#44 (comment)
@airbreather let me know if I can help in any way moving this forward, this is currently the only limitation I know of that prevent me to upgrade to 2.0.

@DGuidi
Copy link
Contributor

DGuidi commented Jan 23, 2020

yep, here's the problem

context should be invalidated when reading inner attributes

@DGuidi
Copy link
Contributor

DGuidi commented Jan 23, 2020

hope #47 can fix the problem

@HarelM
Copy link
Contributor Author

HarelM commented Jan 23, 2020

Is there a way I can test it? a temporary Nuget server I can connect to?
If not, please let me know when a new version is available.
Thanks for the quick fix!

airbreather pushed a commit that referenced this issue Jan 23, 2020
* #46 fixed self-referencing deserialization of attribute tables

* used the right attribute for tests
@airbreather
Copy link
Member

Is there a way I can test it? a temporary Nuget server I can connect to?

@HarelM the potential fix has been merged to develop. When this happens, that kicks off an automatic process that, assuming everything goes well (which it did), will publish the new package to our MyGet feed (here's the one for NetTopologySuite.IO.GeoJSON).

Version 2.0.2-pre.182840845+ci.travis.32 (or just 2.0.2-pre.182840845, if you prefer) is a prerelease of 2.0.2 that has this fix in it, so go ahead and try it out from there.

In theory, this command should be what you need to get it started:

dotnet add package NetTopologySuite.IO.GeoJSON --version 2.0.2-pre.182840845 --source https://www.myget.org/F/nettopologysuite/api/v3/index.json

@HarelM
Copy link
Contributor Author

HarelM commented Jan 26, 2020

@DGuidi Thanks for the fix! I can't reproduce the error when using 2.0.2.
Please let me know when this is available on the global NuGet Server.
Feel free to close this issue.

@HarelM
Copy link
Contributor Author

HarelM commented Jan 30, 2020

Sorry for nagging, Any chance to release this fix? I really want to put behind me all the NTS 2.0 upgrade...

@HarelM
Copy link
Contributor Author

HarelM commented Feb 4, 2020

Seems like there a pull request from July that seems similar to this fix, I haven't fully looked into it for comparison but the issue described and the solution seems close: #36.
@clegendre how did you work around this issue? using a private branch?
Can this be released? Pretty please...? :-)

@airbreather
Copy link
Member

Please let me know when this is available on the global NuGet Server.

@HarelM it's there now.

@HarelM
Copy link
Contributor Author

HarelM commented Feb 5, 2020

Great! Thanks! probably worth copying the test from #36 and closing the PR as well as I think it is the same code change.

@HarelM
Copy link
Contributor Author

HarelM commented May 31, 2020

@DGuidi Is it possible that the fix is not thread safe?
I have an issue which sometimes manifests that the entire attributes table (geojson properties) is being copied from one feature to the other.
It doesn't happen all the time, and this is why I believe it might have to do with thread safety and timing...
The objects are deserialized from an elasticsearch client, so I'm not sure if they use the same geojson serializer on all the requests...
When I change the code to try and catch the issue, it disappears :-(

@DGuidi
Copy link
Contributor

DGuidi commented Jun 2, 2020

looking at the code, seems strange to me to see thread-safety problems in my fix. the only "variable" in the code (the serialized context) wasn't changed at all... of course if more than a serializer is used, then serializer.Context.Context is different between the objects, so you should see the "old" behaviour (the deserialization error and the self-referencing loop)

@HarelM
Copy link
Contributor Author

HarelM commented Jun 2, 2020

Thanks for the response. Unfortunately, I can't fully explain the behavior but I was able to reproduce the issue several times on my machine.
It might be how elastic is using the serializer (they might use the same one and this can cause this issue...?) which I don't fully know but the chances are low that there's a bug in their code and not mine or this serializer...
Currently my workaround was not to make the database call non-parallel (which is part of a process running on the server once a day), but I'm not sure I can keep it that way when dealing with a lot of users...

@DGuidi
Copy link
Contributor

DGuidi commented Jun 2, 2020

if you're able to reproduce the error in your machine, you can change the code of geojson serializer and add a private guid field initialized in the constructor, and then log this guid and the thread id when the method is called. Just an idea, to see if the same object is shared between calls. If so, probably the serialization context can be the cause of your error, and I fear this require a larger refactoring.
As far as I remember, probably you can call some deserialize method overload that accepts a new serialization context explicitly passed as parameter...

@HarelM
Copy link
Contributor Author

HarelM commented Jun 7, 2020

The serializer is initialized only once when the app starts and I know that it is being used from multiple threads (according to logging).
I have no idea what else I can check or how to further debug it. It used to happen more often on my machine, but now it doesn't (probably due to just a little bit different timing if I need to guess).
I can see that it happens in production server about 50% of the timed so I can reproduce it somehow but it's hard (long and complicated process)...
Let me know what else I can try as I don't think I can add locking mechanism to the database layer as it would kill the performance of the site...

@DGuidi
Copy link
Contributor

DGuidi commented Jun 7, 2020

The serializer is initialized only once when the app starts and I know that it is being used from multiple threads (according to logging).

This is definetively a problem for me. As far as I know, if the SerializationContext is somewhat shared between different threads deserializing different objects, you can see randomic errors like the ones you see.
To "fix" the problem, maybe you can patch the code in your own fork and use a dictionary (instead of SerializationContext) using threadid as a part of the key (and of course some lock to serialize this part of the code). I think you can reach better performances because you serialize only this little part of the code, and not the entire deserialization process.

My2Cents.

@HarelM
Copy link
Contributor Author

HarelM commented Jun 7, 2020

@DGuidi thanks a lot for the super quick response and detailed answer.
I'm not sure I'll be able to do what you suggest, not because I don't think this is the right solution/workaround but because I'm not familiar with the code and how to patch it.
I've looked at the code that was used to fix this issue and I'm not entirely sure I understand it... :-(
If you could direct me to the right place in the code and might be able to try it.
@clegendre are you experiencing this too?

@DGuidi
Copy link
Contributor

DGuidi commented Jun 8, 2020

actually looking better at the code I'm not sure that my suggestion is actually viable :(

The serializer is initialized only once when the app starts and I know that it is being used from multiple threads (according to logging).

this is part of your code or is elastic doing this?

@HarelM
Copy link
Contributor Author

HarelM commented Jun 16, 2020

@DGuidi anything I can do to help out with this? Seems like this is happening more often than not... :-(
I'm pretty sure it didn't happen in pre 2.0 version (1.7 something) but I would like to avoid downgrading as there are a lot of breaking changes between those versions...

@DGuidi
Copy link
Contributor

DGuidi commented Jun 17, 2020

@HarelM previous version look the using the same code to me

The problem can be that in the past you haven't used nested attributes tables?

@HarelM
Copy link
Contributor Author

HarelM commented Jun 17, 2020

I have used nested attributes table from the beginning.
The current problem is not with the nested attribute table but with the fact the the entire "properties" object (the main AttributeTable) is shared (or duplicated, not sure) between two different Features.
Might be a timing difference that is not related to the upgrade to 2.0 though, if you say that the code didn't change...

@DGuidi
Copy link
Contributor

DGuidi commented Jun 17, 2020

I've checked a bit and the code looks the same from 2.0 and previous versions.
Anyway, the problem to me is that the same serializer is shared across different features in parallel.
I'm unable to understand if this is a problem generated from your code or in some elastic code.
Reading briefly your code, looks that GeoJsonNetSerializer is created from a factory using a factory method, but I don't see where this factory method is called, actually.
Probably OverwriteDefaultSerializers returns a list of serializers that are created once and then are used in parallel?
Probably you need to create your own JsonFactory?

@HarelM
Copy link
Contributor Author

HarelM commented Jun 17, 2020

I have no clue either... :-(
Maybe I can use a different serializer/deserializer to convert to temporary object and then convert to NTS objects after I get the data from the database and see if it helps, I don't know...

but I don't see where this factory method is called, actually.

Yea, it's probably called inside the elastic code...

Probably OverwriteDefaultSerializers returns a list of serializers that are created once and then are used in parallel?

probably...

Probably you need to create your own JsonFactory?

Not sure how this would help as it will probably do the same as the elastic factory... But I don't mind trying anything at this point...

@HarelM
Copy link
Contributor Author

HarelM commented Jun 17, 2020

Seems like this is open source and the code can be seen here:
https://github.com/elastic/elasticsearch-net/tree/4c2e1e776239f7a4f2c613dfd9b98f2059b404ed
Not sure how helpful this is though...

@HarelM
Copy link
Contributor Author

HarelM commented Jun 17, 2020

I'll try and debug this by seeing if I can dump the response I get from elastic as text if there's a problem:
https://stackoverflow.com/questions/26753277/how-can-i-pass-through-the-raw-json-response-from-a-nest-elasticsearch-query
This will hopefully allow us to see if the response json is not the root cause, and allow to control the deserialization, I hope...

@DGuidi
Copy link
Contributor

DGuidi commented Jun 17, 2020

You can also try to open a issue in their repo: to me, this is a bug and your/nts code is doing the right job. My2cents, of course.

@HarelM
Copy link
Contributor Author

HarelM commented Jun 18, 2020

Well, I'd be happy to be wrong and find out that the bug is in my code - that way I can easily fix it.
Currently I don't know how to progress.
The following is a dump of the json that elastic send back. It doesn't have any duplication. but I added a code to check the in-memory objects for duplication and dump a file which resulted in the following dump:
2020-06-18T07-03-09.6998079+03-00.txt
Code can be found here:
https://github.com/IsraelHikingMap/Site/blob/master/IsraelHiking.DataAccess/ElasticSearchGateway.cs#L392L398
The following warning is what I got in the log:

2020-06-18 07:03:09.6998 | 94 |WARN | Got duplicate id Jeepolog_1832 from database :-(, writing dump to: ...2020-06-18T07-03-09.6998079+03-00.txt 

The only idea I currently have is to try and migrate to STJ instead of Json.Net and see if something changes in the behavior... or something similar to override how elastic is using this library...

@HarelM
Copy link
Contributor Author

HarelM commented Jun 18, 2020

I've opened a ticket to elastic as well, I hope they can have a look at it and suggest a solution of some kind.
elastic/elasticsearch-net#4791

@DGuidi
Copy link
Contributor

DGuidi commented Jun 18, 2020

added a comment in the elastic issue, hope to be helpful

@HarelM
Copy link
Contributor Author

HarelM commented Jun 18, 2020

Thanks for that! :-)
Here's another go at this - I hope you are correct and that the bug is in the ES code.
I looked at the code the way it way back in 5.6.6 (which is the version I'm using).
I took the entire implementation of the serializer and modified it to create a new json serializer every serialization/deserialization and store only the settings object(s).
Finger crossed
The original code is here:
https://github.com/elastic/elasticsearch-net/blob/4c2e1e776239f7a4f2c613dfd9b98f2059b404ed/src/Nest/CommonAbstractions/SerializationBehavior/JsonNetSerializer.cs
The following is my modification (simplified since I don't need to support derived classes):

    /// <summary>
	/// A JSON serializer that uses Json.NET for serialization and able to parse geojson objects
	/// </summary>
	public class JsonNetSerializerWithGeoJson : IElasticsearchSerializer
    {
        private static readonly Encoding ExpectedEncoding = new UTF8Encoding(false);

        private readonly ConcurrentDictionary<string, IPropertyMapping> Properties = new ConcurrentDictionary<string, IPropertyMapping>();
        private readonly JsonSerializerSettings _noneIndentedSettings;
        private readonly JsonSerializerSettings _indentedSettings;

        public JsonNetSerializerWithGeoJson(IConnectionSettingsValues settings)
        {
            _noneIndentedSettings = CreateSettings(SerializationFormatting.None, settings);
            _indentedSettings = CreateSettings(SerializationFormatting.Indented, settings);
        }

        public IPropertyMapping CreatePropertyMapping(MemberInfo memberInfo)
        {
            var memberInfoString = $"{memberInfo.DeclaringType?.FullName}.{memberInfo.Name}";
            if (Properties.TryGetValue(memberInfoString, out var mapping))
                return mapping;

            mapping = PropertyMappingFromAttributes(memberInfo);
            Properties.TryAdd(memberInfoString, mapping);
            return mapping;
        }

        public T Deserialize<T>(Stream stream)
        {
            if (stream == null) return default;

            using var streamReader = new StreamReader(stream);
            using var jsonTextReader = new JsonTextReader(streamReader);
            return JsonSerializer.Create(_noneIndentedSettings).Deserialize<T>(jsonTextReader);
        }

        public Task<T> DeserializeAsync<T>(Stream stream, CancellationToken cancellationToken = default(CancellationToken))
        {
            //Json.NET does not support reading a stream asynchronously :(
            var result = Deserialize<T>(stream);
            return Task.FromResult(result);
        }

        public void Serialize(object data, Stream writableStream, SerializationFormatting formatting = SerializationFormatting.Indented)
        {
            var serializer = formatting == SerializationFormatting.Indented
                ? JsonSerializer.Create(_indentedSettings)
                : JsonSerializer.Create(_noneIndentedSettings);

            using var writer = new StreamWriter(writableStream, ExpectedEncoding, 1024, true);
            using var jsonWriter = new JsonTextWriter(writer);
            serializer.Serialize(jsonWriter, data);
            writer.Flush();
            jsonWriter.Flush();
        }

        private static IPropertyMapping PropertyMappingFromAttributes(MemberInfo memberInfo)
        {
            var jsonProperty = memberInfo.GetCustomAttribute<JsonPropertyAttribute>(true);
            var dataMember = memberInfo.GetCustomAttribute<DataMemberAttribute>(true);
            var ignoreProperty = memberInfo.GetCustomAttribute<JsonIgnoreAttribute>(true);
            if (jsonProperty == null && ignoreProperty == null && dataMember == null) return null;

            return new PropertyMapping { Name = jsonProperty?.PropertyName ?? dataMember?.Name, Ignore = ignoreProperty != null };
        }

        private JsonSerializerSettings CreateSettings(SerializationFormatting formatting, IConnectionSettingsValues connectionSettings)
        {
            var settings = new JsonSerializerSettings()
            {
                Formatting = formatting == SerializationFormatting.Indented ? Formatting.Indented : Formatting.None,
                ContractResolver = new ElasticContractResolver(connectionSettings, null),
                DefaultValueHandling = DefaultValueHandling.Include,
                NullValueHandling = NullValueHandling.Ignore
            };
            foreach(var converter in GeoJsonSerializer.Create(GeometryFactory.Default, 3).Converters)
            {
                settings.Converters.Add(converter);
            }
            return settings;
        }
    }

@HarelM
Copy link
Contributor Author

HarelM commented Jun 19, 2020

@DGuidi Thanks for all your help! seems like my last commit had fixed the issue.
Indeed seems to be a bug in elasticsearch code like you said.
I'll continue writing there.
Thanks again! This kept me awake for too long... :-P

@DGuidi
Copy link
Contributor

DGuidi commented Jun 19, 2020

Just to be curious, you written in elastic issue

I basically took the 5.6.6 serializer

so basically I assume that in the past you haven't experienced the bug because you used a previous version of elastic code, right?

@HarelM
Copy link
Contributor Author

HarelM commented Jun 19, 2020

No, I think the bug was there all along, I just didn't experienced it I believe. I'm using the 5.6.6 version in production for probably more than a year or two...
I changed some code to do stuff in parallel and with large amount of objects more frequently and I think that caused the issue to surface.
Again, I'm not sure... I hate timing and multithreading issues... :-(

@DGuidi
Copy link
Contributor

DGuidi commented Jun 19, 2020

"
when your program is slow, you have a problem.
if you try to speed up your slow program using threads, then you have two problems.
"

Again, based on a little dig in the code, to me it's a bug (or at least an unexpected behaviour) in elastic code.

@HarelM
Copy link
Contributor Author

HarelM commented Sep 22, 2020

Multithreading issue might be fixed by #59... (just adding a reference here...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants