Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

local: harden local_close() and use it for local_open() failure #575

Merged
merged 3 commits into from
Aug 3, 2020

Conversation

commodo
Copy link
Contributor

@commodo commodo commented Jul 24, 2020

There's a bug, when enabling a buffer (with local_buffer_enabled_set(),
i.e. writing to 'buffer/enable'), we can get -EINVAL, because the scan_mask
is invalidated by the kernel.
The issue seems to be that the local_open() exit-on-failure path is
un-balanced. Some things don't seem to be cleaned-up on exit-on-failure.

This is seen with iiod trying to open a high-speed buffer device, where the
exit-on-failure path does not call munmap() on the buffers, nor does it
free the memory allocated for the buffer (in libiio). On the next
retry/open the buffer device is locked and -EBUSY is returned.
Essentially, this leaks resources (a file-descriptor and some memory for
the buffers).
Closing and re-opening iiod fixes the issue.

Also, the local_close() function should not exit on the first fail in
finds. It should continue with disabling everything it can. The only error
that might make sense, is if the main FD is not initialized.

This change simplifies the local_close() a bit: we return -EBADF if the
main FD is -1. Other than that, we continue disabling/cleaning everything
else (with the proper NULL/fd checks in place).

This makes it re-usable for the local_open() exit-on-failure path.
This could be addressed [also] by balancing the initialization in the
reverse order of init. But it looks like something might be omitted later
(as they were up until now) in the exit-on-failure path.
So maybe a more future-proof method is to just re-use the local_open()
routine where possible.

Signed-off-by: Alexandru Ardelean [email protected]

@rgetz
Copy link
Contributor

rgetz commented Jul 24, 2020

I don't know if @pcercuei or @larsclausen have time to have a peak at this one?

@larsclausen
Copy link
Contributor

There's a bug...

What are the effects of the bug? Does it crash?

@commodo
Copy link
Contributor Author

commodo commented Jul 27, 2020

There's a bug...

What are the effects of the bug? Does it crash?

Hmm, maybe I should describe it a bit better in the commit desc.
Also, I'll admit, I may need to dig this into the kernel a bit.
I was a bit happy to just use this hardened local_open() vs going deeper.

So, issue is: local_open() creates FD, local_buffer_enabled_set() fails [with -EINVAL], then on the next local_open() call, it returns -EBUSY (busy flag in the IIO device is still left as set), the IIO device is stuck as busy;
The issue is with iiod. Since it's a process.
Restarting iiod fixes the issue.

Some resources seem to stay locked/leaked after local_buffer_enabled_set() fails and some cleanup isn't happening.

I'll dig quick into the kernel and come back.

@commodo
Copy link
Contributor Author

commodo commented Jul 27, 2020

There's a bug...

What are the effects of the bug? Does it crash?

Hmm, maybe I should describe it a bit better in the commit desc.
Also, I'll admit, I may need to dig this into the kernel a bit.
I was a bit happy to just use this hardened local_open() vs going deeper.

So, issue is: local_open() creates FD, local_buffer_enabled_set() fails [with -EINVAL], then on the next local_open() call, it returns -EBUSY (busy flag in the IIO device is still left as set), the IIO device is stuck as busy;
The issue is with iiod. Since it's a process.
Restarting iiod fixes the issue.

Some resources seem to stay locked/leaked after local_buffer_enabled_set() fails and some cleanup isn't happening.

I'll dig quick into the kernel and come back.

So, after some quick debug-prints into the kernel, what seems to be happening is that this code does not run:
https://github.com/analogdevicesinc/linux/blob/master/drivers/iio/industrialio-core.c#L1619
Even though, iiod is calling close() on that FD [via exit-on-failure path in local_open()]; this close() that iiod is doing doesn't return an error though:
https://github.com/analogdevicesinc/libiio/blob/master/local.c#L1001

When I add this patchset, the chrdev_release seems to be.

I assumed that it may be a race-condition, and I added some sleep(1000)/wait [after the close()], it doesn't seem to be helping.
Maybe I forgot some pthread basics.

The local_close() though, seems to be doing some munmap() calls, which look like a good idea to be added in the exit-on-failure path.
I don't know if those are the ones helping, but there's also the matter of free-ing some malloc-ed data for high-speed buffers.

@larsclausen
Copy link
Contributor

The local_close() though, seems to be doing some munmap() calls, which look like a good idea to be added in the exit-on-failure path.
I don't know if those are the ones helping, but there's also the matter of free-ing some malloc-ed data for high-speed buffers.

A mapping on a file will keep a reference to that file. So this is probably the issue then.

@rgetz
Copy link
Contributor

rgetz commented Jul 27, 2020

so, in local, the only mmap is in enable_high_speed(): link

which should be released as part of local_close() link in the normal case.

when something calls local_open(), which eventually calls enable_high_speed(), there are some exit/error paths taken afterwards here and here (those two should not matter, since those are only executed when !pdata->is_high_speed, but the latter here that do not munmap (if local_enable_buffer() fails).

@commodo
Copy link
Contributor Author

commodo commented Jul 28, 2020

Changelog v1 -> v2:

  • added more info about iiod issue in commit comment

@rgetz
Copy link
Contributor

rgetz commented Jul 28, 2020

there are a few close() calls which now ignore the return code. From the man page:
A careful programmer will check the return value of close(). a failure return should be used only for diagnostic purposes

@commodo
Copy link
Contributor Author

commodo commented Jul 29, 2020

there are a few close() calls which now ignore the return code. From the man page:
A careful programmer will check the return value of close(). a failure return should be used only for diagnostic purposes

yeah; i also thought about that for a bit when doing the initial change;
exiting there/early may prevent more cleanup from happening; so not exiting there, is a good idea;

i was wondering if printing the error code from close() makes sense; or is useful, or is just noise;
tbh, i don't know; i don't have any strong opinions about it;
any suggestion is welcome;
an error code doesn't seem to be coming through, even when close() doesn't actually free the FD [due to munmap not being called]

@rgetz
Copy link
Contributor

rgetz commented Jul 29, 2020

I think passing it up, and printing something might be worth it - then we can decide (based on user feedback) if it is a real problem or not. Won't be noisy, since I'm planning on starting the error message revamp in the next couple of weeks.

@commodo
Copy link
Contributor Author

commodo commented Jul 30, 2020

I think passing it up, and printing something might be worth it - then we can decide (based on user feedback) if it is a real problem or not. Won't be noisy, since I'm planning on starting the error message revamp in the next couple of weeks.

alright;
will return the error code of the first error; we can't print messages for other error codes;
there 2 close() calls, 1 ioctl() x number-of-buffers, 1 local_buffer_enabled_set() call, 1 channel_write_state() x number-of-channels

This also does a minor rework in re-defining the 'local_enable_buffer()'
function into a 'local_buffer_enabled_set(..., bool en)' function which can
enable/disable a local buffer by writing into the 'buffer/enable'.

Keeping an internal state complicated things, as this state should be
handled by the kernel.
Even so, the original local_enable_buffer() would only get called once,
while the 'local_open()' function would also bypass this state potentially
causing inconsistency.

Signed-off-by: Alexandru Ardelean <[email protected]>
There's a bug, when enabling a buffer (with local_buffer_enabled_set(),
i.e. writing to 'buffer/enable'), we can get -EINVAL, because the scan_mask
is invalidated by the kernel.
The issue seems to be that the local_open() exit-on-failure path is
un-balanced. Some things don't seem to be cleaned-up on exit-on-failure.

This is seen with iiod trying to open a high-speed buffer device, where the
exit-on-failure path does not call munmap() on the buffers, nor does it
free the memory allocated for the buffer (in libiio). On the next
retry/open the buffer device is locked and -EBUSY is returned.
Essentially, this leaks resources (a file-descriptor and some memory for
the buffers).
Closing and re-opening iiod fixes the issue.

Also, the local_close() function should not exit on the first fail in
finds. It should continue with disabling everything it can. The only error
that might make sense, is if the main FD is not initialized.

This change simplifies the local_close() a bit: we return -EBADF if the
main FD is -1. Other than that, we continue disabling/cleaning everything
else (with the proper NULL/fd checks in place).

This makes it re-usable for the local_open() exit-on-failure path.
This could be addressed [also] by balancing the initialization in the
reverse order of init. But it looks like something might be omitted later
(as they were up until now) in the exit-on-failure path.
So maybe a more future-proof method is to just re-use the local_open()
routine where possible.

Signed-off-by: Alexandru Ardelean <[email protected]>
With this change:
* all error returns in local_close() get printed
* the first error code is returned back to the caller

Signed-off-by: Alexandru Ardelean <[email protected]>
@commodo
Copy link
Contributor Author

commodo commented Jul 30, 2020

I added a separate commit for the error handling in local_close().

I am not sure, the code has much added value, vs it's quantity.
I think that a void close() would have simplified things [vs int local_close()]; the int return kind of makes you want to return something [other than 0], but I am not sure it's really useful for close().
A lot of the times, close() is ignored in many places i've seen it used [other than libiio].

Also, since many places could return an error code in local_close(), the whole logic of keeping only the first error code [when other places could fail as well], is ¯_(ツ)_/¯
Also, returning the last error, is also not great.
I'm mostly feeling that we have other bigger bugs, issues and design flaws that we should address than this.

@rgetz
Copy link
Contributor

rgetz commented Jul 31, 2020

This looks good to me - thanks

as long as @larsclausen doesn't have any comments...

-Robin

@larsclausen
Copy link
Contributor

👍

@dNechita dNechita merged commit ef12218 into master Aug 3, 2020
@dNechita dNechita deleted the fix-local-close branch August 3, 2020 08:44
@rgetz
Copy link
Contributor

rgetz commented Aug 3, 2020

this caused a warning in coverity...

1049 CID 361216 (#1 of 1): Logically dead code (DEADCODE)
dead_error_line: Execution cannot reach this statement

I think because, you are missing a ret1 = on:
https://github.com/analogdevicesinc/libiio/blob/master/local.c#L1041

@commodo
Copy link
Contributor Author

commodo commented Aug 3, 2020

this caused a warning in coverity...

1049 CID 361216 (#1 of 1): Logically dead code (DEADCODE)
dead_error_line: Execution cannot reach this statement

I think because, you are missing a ret1 = on:
https://github.com/analogdevicesinc/libiio/blob/master/local.c#L1041

yeah; it's a good catch on the side of coverity;
but it's also one reason why i did not really want to do patch 4b0a7a2
too much added complexity with not much added value; and difficult to review apparently

@rgetz
Copy link
Contributor

rgetz commented Aug 4, 2020

but it's also one reason why i did not really want to do patch 4b0a7a2
too much added complexity with not much added value; and difficult to review apparently

The reason I asked for it - is there were error conditions on close() failing, that your previous patches removed. That is all. It's basically keeping those failing paths as they were - not adding more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants