Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RotatingFileHandler fails when logs are stored in a non-posix file system docker-mounted in Linux-OS #120647

Closed
lollo0296 opened this issue Jun 17, 2024 · 4 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@lollo0296
Copy link

lollo0296 commented Jun 17, 2024

Bug report

Bug description:

Issue description

The RotatingFileHandler fails to execute the method self.shouldRollover() because of an OSError: [Errno 9] Bad file descriptor.

Background info: The issue appears in an application running in Docker on Debian. I use Windows Server 2016 as my main file server, so I usually want all my logs stored over there.

I really do not want to provide any misleading information but to me, it looks like the issue started appearing with python 3.12.
I've been using the python logging module for years now within a lot of different projects, all running in docker and always storing the log files over to my Windows file server through this “double mounting” technique (more about this down below).

I started seeing the issues when I first deployed an app with python 3.12. This was a new project, which never ran under previous python versions. However, I have a couple of other projects which shares the same code and the same logging technique but still run previous python versions (nothing ancient, >=3.9). This issue never came up there.

My guess is that python threats the directory as posix-complient as it detects a Linux-OS, although this is not the case as the directory lives on a NTFS file system on Windows. May this be the case?

Strange fact 😵: My app runs 24/7. It runs the whole week no problem, writes a lot of log entries like a charme. But when I come back after the weekend, I find it “stuck” on this error. As if the shouldRollover() method somehow gets called at specific interval or day-of-week? I observed this behavior 3 weeks in a row. Could it be?

To reproduce:

Host OS: Debian 12 Bookworm
Host Kernel: Linux debjm02 6.1.0-21-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64 GNU/Linux
Docker version: 26.1.1, build 4cf5afa
Docker image: python:3.12-slim-bookworm

Mounting "technique":

  • Windows-based NTFS share is mounted on Debian via cifs and /etc/fstab under /mnt/servers/
  • Subdirectory /mnt/servers/myapp/logs/ is then mounted into container under /logs/ (via docker volumes)

Creation of Logger with rotating log file

pylogger = logging.getLogger( self.__name__ )
rotating_file_handler = logging.handlers.RotatingFileHandler(
    filename='/logs/myapp.log', 
    maxBytes=5_000_000, 
    backupCount=5,
    encoding='utf-8')
formatter = logging.Formatter('%(asctime)s [%(levelname)s] %(message)s')
rotating_file_handler.setFormatter(formatter)
pylogger.addHandler(rotating_file_handler)

Error and Stacktrace

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/logging/handlers.py", line 73, in emit
      if self.shouldRollover(record):
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
         File "/usr/local/lib/python3.12/logging/handlers.py", line 197, in shouldRollover
      self.stream.seek(0, 2)  #due to non-posix-compliant Windows feature
        ^^^^^^^^^^^^^^^^^^^^^^
    OSError: [Errno 9] Bad file descriptor
Call stack:
  File "/app/__main__.py", line 213, in <module>
      main()
      File "/app/__main__.py", line 127, in main
      file_count: int = run_task(task)
      File "/app/__main__.py", line 44, in run_task
      log.info(f"{task.name}: Stopping processing for current poll, as poll_max_files have been reached")
      File "/usr/local/lib/python3.12/site-packages/meethpycommons/logger.py", line 220, in info
      target.info(obj)
    Message: 'Auftragsbestätigungen: Stopping processing for current poll, as poll_max_files have been reached'
Arguments: ()
2024-05-21 09:18:31 [WRN] Auftragsbestätigungen: 30 file/s processed [success/error: 0/30]

CPython versions tested on:

3.12

Operating systems tested on:

Linux

@lollo0296 lollo0296 added the type-bug An unexpected behavior, bug, or error label Jun 17, 2024
@serhiy-storchaka
Copy link
Member

This is strange. There were no changes in this part of code since 3.11 (and the last change was making so that only regular files are now rotated, not symlinks, devices or named pipes). The EBADF error should only be raised if the file descriptor was closed, unless there is a nonstandard behavior for your “double mounting” combination. This can only happen when the RotatingFileHandler is concurrently used in different threads. But every code that uses self.stream should be guarded by a special per-handler lock, so I don't know how this is practically possible.

cc @vsajip

@lollo0296
Copy link
Author

This is strange. There were no changes in this part of code since 3.11

I took my time to look at the ohter projects of us. I found out that we never actually deployed 3.11 to production. We have one project running on 3.9 and the rest are rocking 3.10.
So, as far as we know, the bug might have been introduced with the changes in 3.11.

Strange fact 😵: My app runs 24/7. It runs the whole week no problem, writes a lot of log entries like a charme. But when I come back after the weekend, I find it “stuck” on this error. As if the shouldRollover() method somehow gets called at specific interval or day-of-week? I observed this behavior 3 weeks in a row. Could it be?

@serhiy-storchaka do you have a possible explanation for this?

@serhiy-storchaka
Copy link
Member

Do you use RotatingFileHandler or TimedRotatingFileHandler? The former should not rely on time. Can it be an external issue, caused by scheduled maintenance of the mounted file system? How much rollovers occurred in a week?

BTW, do your program use forking? If the handler is created in the parent process which was later forked, the file descriptors may be closed in the child process. Although nothing was changed in 3.10 or 3.11. Significant changes were in 3.4.

Try to unmount the file system with logs and then mount it back. Does it cause the same issue?

@lollo0296
Copy link
Author

After further investigation, I have to give up on this side. I was not really able to narrow down the issue and identify its cause.
I am going to close the issue for the moment and will open it again if necessary.

Some additional info that I gathered:

  • This has nothing to do with the time passing. Once a week, on early Sunday morning, our server VMs are being restarted by a patch management tool. The restart of the file server VM is what triggers all the issues. The linux VM and hence the dockerized python script are not being rebooted and keeps running across the restart of the file server.

@serhiy-storchaka serhiy-storchaka closed this as not planned Won't fix, can't repro, duplicate, stale Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
Development

No branches or pull requests

2 participants