Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChildrenRetrievalPolicy does not work correctly sometimes #40

Closed
andreashuber-lawo opened this issue Mar 28, 2017 · 27 comments
Closed

ChildrenRetrievalPolicy does not work correctly sometimes #40

andreashuber-lawo opened this issue Mar 28, 2017 · 27 comments
Assignees
Milestone

Comments

@andreashuber-lawo
Copy link
Contributor

If ChildrenRetrievalPolicy is used as described in #39, the retrieval process sometimes just stops.

@andreashuber-lawo
Copy link
Contributor Author

Apparently, this bug only happens when await Consumer.SendAsync() is explicitly called after setting INode.ChildrenRetrievalPolicy. When await Consumer.SendAsync() is not called then things seem to work just fine (requests are thus sent when Consumer.AutoSendInterval elapses). Therefore, to ensure that the GetDirectory request for a node is sent before INode.ChildrenRetrievalPolicy of the next node is set, client code could thus call await Task.Delay() with a timeout that is e.g. double of what is set for Consumer.AutoSendInterval. Of course, this is a horrible hack and should be removed as soon as this bug has been fixed.

@radio42
Copy link

radio42 commented Mar 31, 2017

I haven't fully checked your source code yet, but to me it sounds a bit as if you are running into an async 'deadlock' situation, as e.g. described here:
http://blog.stephencleary.com/2012/07/dont-block-on-async-code.html

It suggeststwo best practices:

  1. In a "library” async methods, use ConfigureAwait(false) wherever possible.
  2. Don’t block on Tasks; use async all the way down.

As said, I haven't tested myself (due to the lack of an appropriate device with the known behavior), but there are 2 quick slots which you might take a look at:

a) when calling Consumer.SendAsync() it calls internally client.SendMessageAsync and this implementation calls return this.SendMessageCoreAsync(message, payload);- so maybe at this point an 'await' or a .ConfigureAwait(false) might help?

b) to try, if there is any deadlock, you might try to call the Consumer.SendAsync() also using the .ConfigureAwait(false), i.e.:
await Consumer.SendAsync().ConfigureAwait(false)

Don't know if that helps...
(Sorry for mistakenly editing your comment!)

@andreashuber-lawo
Copy link
Contributor Author

I haven't fully checked your source code yet, but to me it sounds a bit as if you are running into an async 'deadlock' situation, as e.g. described here:
http://blog.stephencleary.com/2012/07/dont-block-on-async-code.html

No, I think it's a race condition. If it was a deadlock, using Task.Delay externally wouldn't help.

a) when calling Consumer.SendAsync() it calls internally client.SendMessageAsync and this implementation calls return this.SendMessageCoreAsync(message, payload);- so maybe at this point an 'await' or a .ConfigureAwait(false) might help?

No, what you see there is an optimization. If an async method calls only one other async method at the end, then it is permissible to just return the Task returned from the other method and remove the async modifier. You can await there, but that wouldn't change anything.

b) to try, if there is any deadlock, you might try to call the Consumer.SendAsync() also using the .ConfigureAwait(false), i.e.:
await Consumer.SendAsync().ConfigureAwait(false)

Task.ConfigureAwait(false) only prevents deadlocks if you ever call Task.Wait() anywhere in your code (that's what he refers to with "sync-over-async antipattern"). I don't do that and library client code shouldn't do that either. Other than in this very specific situation, Task.ConfigureAwait(false) is an optimization: http://disq.us/p/17wutt9

andreashuber-lawo added a commit that referenced this issue Apr 5, 2017
More testing is necessary.

References #40
@andreashuber-lawo
Copy link
Contributor Author

My tests indicate that this should now be fixed. I ran Bug40Test() with a moderately large tree and it passed. I plan to make a release next week. If you are in a hurry, you might want to compile from sources yourself.

@fredrikbergholtz-sr
Copy link

My test code still hangs. Git tells me that I am in sync with upstream/master but I see no change in this behaviour. I verified that the changes of the commit that should fix this are actually in my working folder. Still the consumer hangs, both when I try to enumerate a Lawo/Sapphire and a Lawo/Nova73.

Interestingly enough, it doesn't fail on providers I made myself based on the lawo/ember-plus/EmberPlus.Net/EmberPlusRouter example project. Those trees are a lot smaller than the ones provided by the hardware, though.

@andreashuber-lawo
Copy link
Contributor Author

My test code still hangs. Git tells me that I am in sync with upstream/master but I see no change in this behaviour. I verified that the changes of the commit that should fix this are actually in my working folder. Still the consumer hangs, both when I try to enumerate a Lawo/Sapphire and a Lawo/Nova73.

AFAIU, Sapphire might still have the problem we discussed in #28. For the Nova73, the tree is probably to big to be retrieved in full. Can you attach a log file as you did for #28?

@fredrikbergholtz-sr
Copy link

Here is a log file for the Nova73. This just stops after reading a few elements:
s101Log_Nova73.txt

I am currently getting a new error message when parsing the Sapphire tree that I didn't get when I tested before: "Unexpected End-of-contents identifier at position 432 for definite length at position 54."

Stack trace:

   vid Lawo.EmberPlusSharp.Ember.EmberReader.ProcessOuter(EmberId id, Int64 idPosition) i C:\Storage\Development\Libraries\EmberPlus\ember-plus-sharp\Lawo.EmberPlusSharp\Ember\EmberReader.cs:rad 857
   vid Lawo.EmberPlusSharp.Ember.EmberReader.ReadCore() i C:\Storage\Development\Libraries\EmberPlus\ember-plus-sharp\Lawo.EmberPlusSharp\Ember\EmberReader.cs:rad 751
   vid Lawo.EmberPlusSharp.Ember.EmberReader.Read() i C:\Storage\Development\Libraries\EmberPlus\ember-plus-sharp\Lawo.EmberPlusSharp\Ember\EmberReader.cs:rad 184
   vid Lawo.EmberPlusSharp.Ember.EmberConverter.ToXmlCore(EmberReader reader, XmlWriter writer, FieldPath`2 previousPath, Int32 currentType) i C:\Storage\Development\Libraries\EmberPlus\ember-plus-sharp\Lawo.EmberPlusSharp\Ember\EmberConverter.cs:rad 225
   vid Lawo.EmberPlusSharp.Ember.EmberConverter.ToXmlCore(EmberReader reader, XmlWriter writer, FieldPath`2 previousPath, Int32 currentType) i C:\Storage\Development\Libraries\EmberPlus\ember-plus-sharp\Lawo.EmberPlusSharp\Ember\EmberConverter.cs:rad 301
   vid Lawo.EmberPlusSharp.Ember.EmberConverter.ToXmlCore(EmberReader reader, XmlWriter writer, FieldPath`2 previousPath, Int32 currentType) i C:\Storage\Development\Libraries\EmberPlus\ember-plus-sharp\Lawo.EmberPlusSharp\Ember\EmberConverter.cs:rad 305
   vid Lawo.EmberPlusSharp.Ember.EmberConverter.ToXmlCore(EmberReader reader, XmlWriter writer, FieldPath`2 previousPath, Int32 currentType) i C:\Storage\Development\Libraries\EmberPlus\ember-plus-sharp\Lawo.EmberPlusSharp\Ember\EmberConverter.cs:rad 305
   vid Lawo.EmberPlusSharp.Ember.EmberConverter.ToXmlCore(EmberReader reader, XmlWriter writer, FieldPath`2 previousPath, Int32 currentType) i C:\Storage\Development\Libraries\EmberPlus\ember-plus-sharp\Lawo.EmberPlusSharp\Ember\EmberConverter.cs:rad 305
   vid Lawo.EmberPlusSharp.Ember.EmberConverter.ToXmlCore(EmberReader reader, XmlWriter writer, FieldPath`2 previousPath, Int32 currentType) i C:\Storage\Development\Libraries\EmberPlus\ember-plus-sharp\Lawo.EmberPlusSharp\Ember\EmberConverter.cs:rad 305
   vid Lawo.EmberPlusSharp.Ember.EmberConverter.ToXml(EmberReader reader, XmlWriter writer) i C:\Storage\Development\Libraries\EmberPlus\ember-plus-sharp\Lawo.EmberPlusSharp\Ember\EmberConverter.cs:rad 77
   vid Lawo.EmberPlusSharp.Ember.EmberConverter.ToXml(Byte[] buffer, XmlWriter writer) i C:\Storage\Development\Libraries\EmberPlus\ember-plus-sharp\Lawo.EmberPlusSharp\Ember\EmberConverter.cs:rad 55
   vid Lawo.EmberPlusSharp.S101.S101Logger.LogMessage(DateTime timeUtc, String direction, S101Message message, Byte[] payload) i C:\Storage\Development\Libraries\EmberPlus\ember-plus-sharp\Lawo.EmberPlusSharp\S101\S101Logger.cs:rad 154
   vid Lawo.EmberPlusSharp.S101.S101Logger.LogMessage(String direction, S101Message message, Byte[] payload) i C:\Storage\Development\Libraries\EmberPlus\ember-plus-sharp\Lawo.EmberPlusSharp\S101\S101Logger.cs:rad 105
   vid Lawo.EmberPlusSharp.S101.S101Client.<>c__DisplayClass37_0.<ProcessMessage>b__0() i C:\Storage\Development\Libraries\EmberPlus\ember-plus-sharp\Lawo.EmberPlusSharp\S101\S101Client.cs:rad 496
   vid Lawo.Threading.Tasks.WorkQueue.<>c__DisplayClass0_0.<Enqueue>b__0() i C:\Storage\Development\Libraries\EmberPlus\ember-plus-sharp\Lawo\Threading\Tasks\WorkQueue.cs:rad 34
   vid System.Threading.Tasks.Task`1.InnerInvoke()
   vid System.Threading.Tasks.Task.Execute()

Log file:
s101Log_Sapphire.txt

@andreashuber-lawo
Copy link
Contributor Author

I'm sorry, I should have been more clear. Can please do the exact same tests again, but this time with Glow Analyzer Proxy? The logs written by that tool provide a bit more detail that could be helpful.

This just stops after reading a few elements:

How does it stop? Is there an exception? What timeouts did you set?

Thanks!

@fredrikbergholtz-sr
Copy link

fredrikbergholtz-sr commented Apr 21, 2017 via email

@fredrikbergholtz-sr
Copy link

Ok, I tested again. I waited longer than the timeout of 10 seconds and it is still hung after reading the tenth element of the Nova73 tree. I do get the keep-alive messages, just no more data and no error messages.

For the Sapphire, I get the new "Unexpected End-of-contents identifier" exception on the same element that it used to get stuck on before.

Nova73.txt
Sapphire.txt

@andreashuber-lawo
Copy link
Contributor Author

Ok, and you're getting this behavior with the Nova73 when you use ChildrenRetrievalPolicy, right? If so, have you tried to use the default overload of Consumer.Create?

What Sapphire version are you using?

@fredrikbergholtz-sr
Copy link

With "Consumer consumer = await Consumer.CreateAsync(client)" to connect to the Nova73, I get the following exception:

"No enumeration, enumMap, value or type field is available for the parameter with the path /_1/_101/_2/_206/_3e8/_60120001." The Nova73 is currently not my priority, I just used it to have an additional unit to test with. Still, it would be nice if that works too. I might need this some day. Getting the Sapphire to work is critical, though. The Sapphire I am testing with is at OS version 5.0.956,

@andreashuber-lawo
Copy link
Contributor Author

Ok, I will ignore the Nova problem for the moment and see whether we can reproduce the Sapphire issue.

@andreashuber-lawo
Copy link
Contributor Author

I'm afraid this will take longer than anticipated, I'm currently waiting for DSA to send me the correct version of the firmware, so that I can try to reproduce the problem.

@andreashuber-lawo
Copy link
Contributor Author

I can't reproduce this with the crystal unit we have in house (it should perform much the same with regards to Ember+). To test, I've commented out all lines in Bug40Test except var manualRoot = await GetTreeAsync(ChildrenRetrievalPolicy.DirectOnly); which gives me consistent results (see 20170504_133031_UTC.xml.txt).

I'll try to get a hold of a Sapphire unit and run the same code there. I'll keep you posted.

@fredrikbergholtz-sr
Copy link

Strange, I get the very same error when running my code against a Crystal or Sapphire unit. Band 1 of the Equalizer makes it crash. When I try to run the modified Bug40Test against the Crystal or the Sapphire, I get a System.ObjectDisposedException:

Test method Lawo.EmberPlusSharp.Model.ConsumerTest.Bug40Test threw exception: 
System.ObjectDisposedException: The CancellationTokenSource has been disposed.
    at System.Threading.CancellationTokenSource.ThrowObjectDisposedException()
   at System.Threading.CancellationTokenSource.Cancel()
   at Lawo.EmberPlusSharp.Model.Consumer`1.CancelAutoSendDelay() in C:\Storage\Development\Libraries\EmberPlus\ember-plus-sharp\Lawo.EmberPlusSharp\Model\Consumer`1.cs:line 283
   at Lawo.EmberPlusSharp.Model.Consumer`1.Dispose() in C:\Storage\Development\Libraries\EmberPlus\ember-plus-sharp\Lawo.EmberPlusSharp\Model\Consumer`1.cs:line 205
   at Lawo.EmberPlusSharp.Model.Consumer`1.<SendReceiveLoop>d__35.MoveNext() in C:\Storage\Development\Libraries\EmberPlus\ember-plus-sharp\Lawo.EmberPlusSharp\Model\Consumer`1.cs:line 344
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.<>c.<ThrowAsync>b__6_0(Object state)
   at Lawo.Threading.Tasks.AsyncPump.SingleThreadSynchronizationContext.RunOnCurrentThread() in C:\Storage\Development\Libraries\EmberPlus\ember-plus-sharp\Lawo\Threading\Tasks\AsyncPump.cs:line 99
   at Lawo.Threading.Tasks.AsyncPump.Run(Func`1 asyncMethod) in C:\Storage\Development\Libraries\EmberPlus\ember-plus-sharp\Lawo\Threading\Tasks\AsyncPump.cs:line 52
   at Lawo.EmberPlusSharp.Model.ConsumerTest.Bug40Test() in C:\Storage\Development\Libraries\EmberPlus\ember-plus-sharp\Lawo.EmberPlusSharpTest\Model\ConsumerTest.cs:line 1533

Running the test against a provider that I made using the ember-plus library, one that doesn't have anything that looks like that Equalizer node and no enum parameters, makes the test pass.

My modified test looks like this:

        [TestMethod]
        [TestCategory("Manual")]
        public void Bug40Test()
        {
            AsyncPump.Run(
                async () =>
                {
                    // var automaticRoot = await GetTreeAsync(ChildrenRetrievalPolicy.All);
                    var manualRoot = await GetTreeAsync(ChildrenRetrievalPolicy.DirectOnly);

                    // Compare(automaticRoot, manualRoot);
                });
        }

And in the GetTreeAsync, I simply changed the host and port to point to the Crystal unit. I also changed the "Culture" at the top of this method to get the error messages in English.

Crystal version: 5.0.107
Sapphire version: 5.0.973

@fredrikbergholtz-sr
Copy link

Could it be that your Crystal has no equalizer configured on any of the sources? The 20170504_133031_UTC.xml.txt file you posted doesn't appear to contain any nodes named "Band 1", "Band 2", etc.

@andreashuber-lawo
Copy link
Contributor Author

When I try to run the modified Bug40Test against the Crystal or the Sapphire, I get a System.ObjectDisposedException:

Strange indeed. I don't really understand why you get this exception. Given the stack trace, line 283 should never be executed. You are using the latest release, right? Against what target are you compiling (desktop .net framework, Windows Store Apps, ..)?

@andreashuber-lawo
Copy link
Contributor Author

Could it be that your Crystal has no equalizer configured on any of the sources?

Most likely. Frankly, I don't have a clue about the crystal. I've just hooked up this unit we happened to have lying around and updated it to the (as of yet unreleased) version 5.0.165. How can I configure an equalizer?

@fredrikbergholtz-sr
Copy link

git tells me that my working directory is up to date with both my own fork and the upstream master. I will try to build from the zip-archive.

My current build targets are:
.NET Framework 4.5
ASP.NET Core 1.0
Windows 8

I am not sure about configuring the Crystal. For the Sapphire, you run the Zirkon configuration utility, load the project from the unit and look for the source in the tree definition view. I am not sure about what it does, but there is a check box for the sources named "Eq Op Disable", and this is NOT checked for any of the sources in the Sapphire I test with.

I tried loading the project from the Crystal into my Zirkon application, but there may be a version mis-match. It won't let me load it.

@fredrikbergholtz-sr
Copy link

Ok, I tested with the source zip-archive of the latest release and get the same error.

@andreashuber-lawo
Copy link
Contributor Author

My current build targets are:
.NET Framework 4.5
ASP.NET Core 1.0
Windows 8

When you run Bug40Test, do you do that with Lawo.EmberPlusSharp.sln loaded, through the Visual Studio Test Explorer (by right-clicking on the test)?

@fredrikbergholtz-sr
Copy link

Yes.

@andreashuber-lawo
Copy link
Contributor Author

I'm in the process of getting access to a Sapphire. Sorry, this might take some time :-/.

@andreashuber-lawo
Copy link
Contributor Author

It turns out that the EmberException that is thrown from your code (with the message "Unexpected End-of-contents ...") is due to the fact that the Sapphire reports an incorrect length in the Ember+ data. Of course, the exception message set by this library is wrong (fixed in commit associated with #45). I've forwarded my findings regarding this bug to DSA, please follow up with them for more information.

What is as of yet unclear, is why you get ObjectDisposedException when you run Bug40Test. Could you please run that test with the Debugger and see whether a EmberException is thrown first? Thanks!

@fredrikbergholtz-sr
Copy link

I apologise for the late response, but I do get weirdness when running Bug40Test in Visual Studio 2015 with ReSharper. Running the test with the standard Test Explorer, it passes. But debugging it from the same Test Explorer gives me a bunch of TaskCanceledException and the debugger never stops.

@andreashuber-lawo
Copy link
Contributor Author

andreashuber-lawo commented May 16, 2017

Thanks for testing! It's good to hear that it works with the standard test explorer, that's what I'm using. I don't understand why it wouldn't work with ReSharper, however.

I no longer have access to a Sapphire, but when I debug Bug40Test (with everything except the var manualRoot = line commented out) with another device, it works as it should (it takes ~30seconds to run). The bunch of TaskCanceledExceptions you see in the Events window is expected (these are all thrown and caught during the shutdown process).

I guess DSA did not yet provide you with a fixed firmware? Once they do, please let me know whether your code then works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants