-
-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
actual crash fix this time #800
Conversation
this is the actual crash fix i kept messing up, for real this time
When does |
This is definitely interesting! let me see if I can find some suspects. amt_produced should never get that large. |
I say it's always a good idea to check for overflows, but I'm pretty sure you've identified the reason you're seeing the issue, and most others aren't: #773 (comment). |
Tracking some research: The baffling thing is that amt_produce should never be that big. It is however many samples were processed since the last time things were processed. Could you uncomment line 183 in that file? I am curious what values you are seeing for the output queue... |
oh - you maybe right @tadscottsmith That would explain why so much gets built up... I am honestly surprised this fixes things... and also why that value, it seems like an int could hold a lot more. maybe it is something further down the stream. |
I suppose it is possible that the queue is getting built up that big due to filesystem writes being held; but it does seem odd. Usually if you start to out-strip your storage throughput, you just start dropping traffic or get spammed with O's. FWIW, I capture about a dozen of calls per second on my T-R instance to a NAS which, like @gisforgirard, I have it mounted as a local folder. I've never seen this crash occur - my TR instance has had months of uptime. However, my NAS is fully solid-state and has a 10Gb connection to the ESXi host where the T-R instance runs; so its possible that despite the large write volume (my T-R logs are witten to the same NAS); my latency is low enough that I don't build the buffer up big enough to trip this bug. |
If my math is right, and 160 samples = 20 ms, then 32768 samples is (32768 / 160 = 204.8 * 20 ms = 4096 ms) ~ 4.1 seconds behind. The max size of an Int_16 is 32767 and there's an awful lot of those in the area that would make me suspicious.
This line stands out after a quick glance.
|
That sounds about right. A couple years back there was an issue where buffers would back up and it led to call audio falling out of sequence. 60-90k in the queue time-shifted transmissions 8-10 seconds into the next call. A little curious about "// buffer limit is 32768, see gnuradio/gnuradio-runtime/lib/../include/gnuradio/buffer.h:186". Which version of gnuradio is this from? |
Oh, not flushing to the network is a problem. Didn't think that it would hold onto it in a buffer there, but in a file buffer somewhere else. It could save to the file system (even remote file systems) once the device becomes ready for it. Sounds like the real fix is making a file save function that holds onto a que of data that it's waiting to write to disk somewhere in the main loop so it doesn't backpressure into this buffer. |
I added a few more debug values to the message. I am also checking with a friend to see if he has any ideas... either way, this seems like a good fix, I am merging it in. |
yeah i have a 10GBASE-T link straight to my NAS but for whatever reason it has weird latency issues where packets won't drop but it will freeze the entire system at the kernel whenever it happens, i have tried different network cards, SFP+ adapters, ethernet cords, kernel settings, and it drives me crazy trying to track down what is going on but still haven't been able to figure it out, it's probably some kind of problem with synology's software but i'm already like $3000 into this stupid thing so maybe i'll go back to truenas for the next build instead... |
this is the actual crash fix i kept messing up and submitting pull requests for, for real this time
honestly this is probably not the best way to solve it, but there is a case where amt_produce ends up going over 32768 and it crashes the program faithfully every time, and this has appeared to solve it for me without any obvious negative side effects, there is probably a better way to go about doing this and its pretty much a hack, but it's a hack that works at least