Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

posix_fallocate can be interrupted by a signal #451

Closed
shastah opened this issue Nov 24, 2021 · 1 comment · Fixed by #453
Closed

posix_fallocate can be interrupted by a signal #451

shastah opened this issue Nov 24, 2021 · 1 comment · Fixed by #453

Comments

@shastah
Copy link
Contributor

shastah commented Nov 24, 2021

Every now and then, use of any Pacemaker cmdline tool fails and results in Pacemaker logging errors:

  (pcmk__new_client)        debug: New IPC client 3efdbecf-c2d9-44bc-b4a6-9bcd48021ba1 for PID 27492 with uid 0 and gid 0
  (handle_new_connection)   debug: IPC credentials authenticated (/dev/shm/qb-7271-27492-12-hfPbKY/qb)
  (qb_ipcs_shm_connect)     debug: connecting to client [27492]
  (qb_rb_open_2)    debug: shm size:524301; real_size:528384; rb->word_size:132096
  (qb_rb_open_2)    debug: shm size:524301; real_size:528384; rb->word_size:132096
  (qb_sys_mmap_file_open)   error: couldn't allocate file /dev/shm/qb-7271-27492-12-hfPbKY/qb-event-cib_rw-data: Interrupted system call (4)
  (qb_rb_open_2)    error: couldn't create file for mmap
  (qb_ipcs_shm_rb_open)     error: qb_rb_open:/dev/shm/qb-7271-27492-12-hfPbKY/qb-event-cib_rw: Interrupted system call (4)
  (qb_rb_close_helper)      debug: Free'ing ringbuffer: /dev/shm/qb-7271-27492-12-hfPbKY/qb-response-cib_rw-header
  (qb_rb_close_helper)      debug: Free'ing ringbuffer: /dev/shm/qb-7271-27492-12-hfPbKY/qb-request-cib_rw-header
  (qb_ipcs_shm_connect)     error: shm connection FAILED: Interrupted system call (4)
  (handle_new_connection)   error: Error in connection setup (/dev/shm/qb-7271-27492-12-hfPbKY/qb): Interrupted system call (4)

The way we worked around it is with a silly "retry" logic in qb_sys_mmap_file_open(). Not sure if this is the best solution, but it proved to be "good enough":

--- a/lib/unix.c	2020-07-29 08:37:35.000000000 +0200
+++ b/lib/unix.c	2021-01-28 21:01:43.984432610 +0100
@@ -79,6 +79,9 @@
 	ssize_t written;
 	char *buffer = NULL;
 	char *is_absolute = strchr(file, '/');
+#ifdef HAVE_POSIX_FALLOCATE
+	int32_t fallocate_retry = 5;
+#endif
 
 	if (is_absolute) {
 		(void)strlcpy(path, file, PATH_MAX);
@@ -116,12 +119,22 @@
 	}
 
 #ifdef HAVE_POSIX_FALLOCATE
-	if ((res = posix_fallocate(fd, 0, bytes)) != 0) {
-		errno = res;
-		res = -1 * res;
-		qb_util_perror(LOG_ERR, "couldn't allocate file %s", path);
-		goto unlink_exit;
-	}
+	/* posix_fallocate(3) can be interrupted by a signal,
+	   so retry few times before giving up */
+	do {
+		fallocate_retry--;
+		res = posix_fallocate(fd, 0, bytes);
+		if (res == EINTR) {
+			qb_util_log(LOG_DEBUG, "got EINTR trying to allocate file %s, retrying...", path);
+			continue;
+		} else if (res != 0) {
+			errno = res;
+			res = -1 * res;
+			qb_util_perror(LOG_ERR, "couldn't allocate file %s", path);
+			goto unlink_exit;
+		}
+		break;
+	} while (fallocate_retry > 0);
 #else
 	if (file_flags & O_CREAT) {
 		long page_size = sysconf(_SC_PAGESIZE);
@chrissie-c
Copy link
Contributor

I had a careful think about this and at first I was against the idea, but as this function is called several times for each connection and there is a similar loop around write below it, I think it's quite reasonable.

Is it possible to send me a PR with the patch please?

shastah added a commit to shastah/libqb that referenced this issue Jan 6, 2022
Every now and then Pacemaker reports errors:

  (pcmk__new_client)        debug: New IPC client 3efdbecf-c2d9-44bc-b4a6-9bcd48021ba1 for PID 27492 with uid 0 and gid 0
  (handle_new_connection)   debug: IPC credentials authenticated (/dev/shm/qb-7271-27492-12-hfPbKY/qb)
  (qb_ipcs_shm_connect)     debug: connecting to client [27492]
  (qb_rb_open_2)    debug: shm size:524301; real_size:528384; rb->word_size:132096
  (qb_rb_open_2)    debug: shm size:524301; real_size:528384; rb->word_size:132096
  (qb_sys_mmap_file_open)   error: couldn't allocate file /dev/shm/qb-7271-27492-12-hfPbKY/qb-event-cib_rw-data: Interrupted system call (4)
  (qb_rb_open_2)    error: couldn't create file for mmap
  (qb_ipcs_shm_rb_open)     error: qb_rb_open:/dev/shm/qb-7271-27492-12-hfPbKY/qb-event-cib_rw: Interrupted system call (4)
  (qb_rb_close_helper)      debug: Free'ing ringbuffer: /dev/shm/qb-7271-27492-12-hfPbKY/qb-response-cib_rw-header
  (qb_rb_close_helper)      debug: Free'ing ringbuffer: /dev/shm/qb-7271-27492-12-hfPbKY/qb-request-cib_rw-header
  (qb_ipcs_shm_connect)     error: shm connection FAILED: Interrupted system call (4)
  (handle_new_connection)   error: Error in connection setup (/dev/shm/qb-7271-27492-12-hfPbKY/qb): Interrupted system call (4)

While it probably might be addressed in Pacemaker code, a simple retry
loop in case posix_fallocate(3) returns EINTR seems to be a decent
workaround.

Fixes: ClusterLabs#451

Signed-off-by: Jakub Jankowski <[email protected]>
chrissie-c pushed a commit that referenced this issue Jan 14, 2022
Every now and then Pacemaker reports errors:

  (pcmk__new_client)        debug: New IPC client 3efdbecf-c2d9-44bc-b4a6-9bcd48021ba1 for PID 27492 with uid 0 and gid 0
  (handle_new_connection)   debug: IPC credentials authenticated (/dev/shm/qb-7271-27492-12-hfPbKY/qb)
  (qb_ipcs_shm_connect)     debug: connecting to client [27492]
  (qb_rb_open_2)    debug: shm size:524301; real_size:528384; rb->word_size:132096
  (qb_rb_open_2)    debug: shm size:524301; real_size:528384; rb->word_size:132096
  (qb_sys_mmap_file_open)   error: couldn't allocate file /dev/shm/qb-7271-27492-12-hfPbKY/qb-event-cib_rw-data: Interrupted system call (4)
  (qb_rb_open_2)    error: couldn't create file for mmap
  (qb_ipcs_shm_rb_open)     error: qb_rb_open:/dev/shm/qb-7271-27492-12-hfPbKY/qb-event-cib_rw: Interrupted system call (4)
  (qb_rb_close_helper)      debug: Free'ing ringbuffer: /dev/shm/qb-7271-27492-12-hfPbKY/qb-response-cib_rw-header
  (qb_rb_close_helper)      debug: Free'ing ringbuffer: /dev/shm/qb-7271-27492-12-hfPbKY/qb-request-cib_rw-header
  (qb_ipcs_shm_connect)     error: shm connection FAILED: Interrupted system call (4)
  (handle_new_connection)   error: Error in connection setup (/dev/shm/qb-7271-27492-12-hfPbKY/qb): Interrupted system call (4)

While it probably might be addressed in Pacemaker code, a simple retry
loop in case posix_fallocate(3) returns EINTR seems to be a decent
workaround.

Fixes: #451

Signed-off-by: Jakub Jankowski <[email protected]>
Reviewed-by: Christine Caulfield <[email protected]>
bmwiedemann pushed a commit to bmwiedemann/openSUSE that referenced this issue Jan 15, 2022
https://build.opensuse.org/request/show/946348
by user yan_gao + dimstar_suse
- Retry if posix_fallocate is interrupted with EINTR (#453) (gh#ClusterLabs/libqb#451, bsc#1193737, bsc#1193912)
  * bsc#1193737-0001-Retry-if-posix_fallocate-is-interrupted-with-EINTR-4.patch

- IPC: server: avoid temporary channel priority loss, up to deadlock-worth (gh#ClusterLabs/libqb#352, rh#1718773, bsc#1188212) (forwarded request 946347 from yan_gao)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants