Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not ok 71 - stat watcher invoked once for size chnage #997

Closed
chu11 opened this issue Mar 1, 2017 · 7 comments
Closed

not ok 71 - stat watcher invoked once for size chnage #997

chu11 opened this issue Mar 1, 2017 · 7 comments
Assignees

Comments

@chu11
Copy link
Member

chu11 commented Mar 1, 2017

not ok 71 - stat watcher invoked once for size chnage
#   Failed test 'stat watcher invoked once for size chnage'
#   at ../../../../src/common/libflux/test/reactor.c line 644.

Just hit this in #996 in travis. Appears to have happened atleast three other times (#884, #822, #799), apparently always with clang compiler. Should try and figure out what's going on causing this to intermittently fail.

As an aside, there is a typo in the above that needs fixing too.

@garlick
Copy link
Member

garlick commented Mar 13, 2017

I hit that one last week during testing of PR #1000

@chu11 chu11 modified the milestone: release 0.8.0 Apr 3, 2017
@grondo
Copy link
Contributor

grondo commented Jul 25, 2017

Hit again in test of #1124 -- I'll take a brief look at it and see if we can get it to fail outside of travis.

@chu11
Copy link
Member Author

chu11 commented Jul 25, 2017

perhaps it's worth noting that with #1124 it was with gcc, not with clang

@trws trws mentioned this issue Sep 11, 2018
chu11 added a commit to chu11/flux-core that referenced this issue Jun 3, 2019
here is a race in the stat watcher test, where the stat watcher does
not initially see the temporary file because it has not yet been
flushed.  Call fsync() to flush.

Fixes flux-framework#997
@chu11 chu11 self-assigned this Jun 4, 2019
@chu11
Copy link
Member Author

chu11 commented Jun 4, 2019

Decided to look into this old issue and I think I figured it out. The libev stat watcher is a simple polling loop that runs lstat() on a file. The test in question appends data to a file, then unlinks it. With a little bit of racy behavior, an append + unlink can occur in between lstat() checks. Thus, the unlink is recognized but the change in the size of the file is missed.

@grondo
Copy link
Contributor

grondo commented Jun 4, 2019

The libev stat watcher is a simple polling loop that runs lstat() on a file.

Good find! However, I thought libev used inotify by default if it was present, so I'm surprised it has reverted to polling. I wonder why.

@grondo grondo closed this as completed in 71a8537 Jun 4, 2019
@chu11
Copy link
Member Author

chu11 commented Jun 4, 2019

I thought libev used inotify by default if it was present, so I'm surprised it has reverted to polling

Ahhh, you are right. I had missed that subtlety when looking through libev. So the polling part of my comment was incorrect.

The reason I thought it was polling is b/c (AFAICT) libev calls lstat() every time it has been notified. It doesn't check the flags from inotify_event results. So I suspect the race exists either way, as the issue is the append & unlink can happen in between consecutive lstat() calls.

@grondo
Copy link
Contributor

grondo commented Jun 4, 2019

Yep, I think I just came to the same conclusion. There is still a race because libev only uses inotify to avoid polling, not for actual processing of events. Nice find!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants