Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot create context, error code 111 #1145

Open
catkira opened this issue Feb 14, 2024 · 17 comments
Open

cannot create context, error code 111 #1145

catkira opened this issue Feb 14, 2024 · 17 comments
Assignees

Comments

@catkira
Copy link
Contributor

catkira commented Feb 14, 2024

Sometimes when I do weird things, ie my remote libiio app crashes without closing the network context, I cannot create another context to my plutosdr connected via network.

The error message I get when I do iio_info -u ip:192.168.137.2 is
ERROR: Unable to create IIO context ip:192.168.137.2: Connection refused (111)

I have to reboot plutosdr to recover from this.

@mhennerich
Copy link
Contributor

Wondering - can you check if iiod is still running when you get 111?
Maybe this is more a plutosdr-fw issue...

@catkira
Copy link
Contributor Author

catkira commented Feb 20, 2024

Hmm yes, good idea, maybe it's just an iiod crash. I will check when it happens next time.

@catkira
Copy link
Contributor Author

catkira commented Feb 28, 2024

@mhennerich I checked it, iiod was not running anymore. Does iiod store logs somewhere, or can I start it with command line parameters to enable logging?

@catkira
Copy link
Contributor Author

catkira commented Feb 28, 2024

when I restart iiod manually with -d and provoke the crash again I can see this output:

New client connected from 192.168.137.1
New client connected from 192.168.137.1
New client connected from 192.168.137.1
DEBUG: Buffer 0 created.
New client connected from 192.168.137.1
Client exited
DEBUG: Buffer 0 created.

Client exited
Client exited
Client exited
New client connected from 192.168.137.1

New client connected from 192.168.137.1
New client connected from 192.168.137.1
DEBUG: Buffer 0 created.
New client connected from 192.168.137.1
Client exited
DEBUG: Buffer 0 created.
New client connected from 192.168.137.1
Client exited
DEBUG: Buffer 0 created.
New client connected from 192.168.137.1
DEBUG: Buffer 0 created.
Client exited
DEBUG: Block 0 freed.
Segmentation fault
#

@rgetz
Copy link
Contributor

rgetz commented Mar 12, 2024

can you run it (and make it crash) in gdb, so you can get a backtrace?

@catkira
Copy link
Contributor Author

catkira commented Mar 12, 2024

I think gdb is not included in the pluto buildroot, so I would have to cross-compile it first right?

@catkira
Copy link
Contributor Author

catkira commented Mar 18, 2024

There was already a gdb package available in buildroot. I added it to my board config and ran it again. I could not provoke the crash anymore. The weird thing is that the crash also does not happen anymore if I run iiod without gdb. It might have been a faulty buildroot build-state, buildroot is quite sensitive if one does not do clean rebuilds all the time.
I close this issue for now and reopen it if the crash pops up again.

@catkira catkira closed this as completed Mar 18, 2024
@catkira
Copy link
Contributor Author

catkira commented May 24, 2024

The crash happened again. I managed to reproduce it while running iiod in gdb. This is the output:
image
(you can ignore the "-- axi_dmac_terminate --", thats a debug printf from kernel)

@rgetz Stack back trace does not show much, because the stack is corrupted
image
@pcercuei do you have any idea?

I am using 7ae4836 (and I don't see any relevant commits after this that could probably fix this crash)

@catkira catkira reopened this May 24, 2024
@catkira
Copy link
Contributor Author

catkira commented May 24, 2024

I think the crash is caused by a call to iio_buffer_cancel()

@catkira
Copy link
Contributor Author

catkira commented May 24, 2024

I used to call the following functions in this order

iio_buffer_cancel(buffer);
iio_channels_mask_destroy(mask);
iio_buffer_destroy(buffer);

the crash does not happen anymore when I change the order to this:

iio_buffer_cancel(buffer);
iio_buffer_destroy(buffer);
iio_channels_mask_destroy(mask);

@pcercuei is this behaviour correct?

@catkira
Copy link
Contributor Author

catkira commented May 24, 2024

The crash also happens if I just call iio_buffer_cancel(buffer); twice after each other. I know it does not make sense to do it, but still it should not crash.

@catkira
Copy link
Contributor Author

catkira commented May 24, 2024

No it does not help, it still sometimes crashes when I call iio_buffer_cancel()
image

@catkira
Copy link
Contributor Author

catkira commented May 27, 2024

@pcercuei I mean this issue <3

@catkira
Copy link
Contributor Author

catkira commented May 27, 2024

I rebuild libiio with LOG_LEVEL=Debug. This is the output that I get before a crash:
image
"-- axi_dmac_terminate_all --" is a debug printf from the kernel driver, that appears every time when I call iio_buffer_cancel(). It is the expected behaviour that this message appears.
From the debug output it looks like I do two calls to iio_buffer_cancel() within a short period. When the 2nd calls happens iiod is still freeing stuff and it seems that's when it crashes.
An easy scenario to reproduce this (or a similar issue) is to just do two calls to iio_buffer_cancel() right after each other. iiod will crash on the 2nd call.

@catkira
Copy link
Contributor Author

catkira commented Jun 3, 2024

I think the problem is that there are some iio_block_dequeue commands queued that get executed even after blocks are already freed.
The iio_block_dequeue() calls are from the buffer-dequeue-thd, but the iio_block_destroy() calls come from the iiod-responder-reader-thd thread.
Is it possible that the iio_block_destroy() should be blocked until the buffer-dequeue-thd is finished?
@mhennerich @pcercuei do you have a little hint for me? :)

@catkira
Copy link
Contributor Author

catkira commented Jun 3, 2024

I think I have some new information. The crash seems to happen that a call to iio_block_destroy() after iio_buffer_cancel() sometimes causes iiod to crash, because it looks like iio_buffer_cancel() already causes the blocks to be automatically destroyed. It's very weird. The libiio code is a bit unintuitive for me, not something I can fix quickly. I think in the long run, this issue should be fixed. For now I will live with my workaround.

@dNechita
Copy link
Contributor

dNechita commented Jun 11, 2024

Hi @catkira,
We are looking into this. Thank you for providing these details that helps us debug this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants