Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified files trigger more than one event on Python 3 #346

Open
bastula opened this issue Mar 8, 2016 · 25 comments
Open

Modified files trigger more than one event on Python 3 #346

bastula opened this issue Mar 8, 2016 · 25 comments

Comments

@bastula
Copy link

bastula commented Mar 8, 2016

On Python 3 (3.5), modified file events are showing up more than once.
This does not occur on Python 2.7.11 with the same exact environment.

Please see the following question on SO for an example (not my post, but same symptoms):

https://stackoverflow.com/questions/32861420/watchdog-getting-events-thrice-in-python-3

@nueverest
Copy link

Same here.

@carlcarl
Copy link

What's your OS environment? (ex: Ubuntu/Archlinux)

@bastula
Copy link
Author

bastula commented Apr 25, 2016

Mac OS X 10.11.4, Python 3.5.1

@nueverest
Copy link

nueverest commented Apr 25, 2016

Windows 8.1, Python 3.5

Windows Vista, Python 3.4

@valencik
Copy link

Mac OS X 10.10.5, Python 3.5.1

@g8gg
Copy link

g8gg commented Oct 9, 2017

CentOS Linux release 7.3.1611 (Core)
Python 3.6.0 :: Anaconda 4.3.1 (64-bit)
Linux 3.10.0-514.21.1.el7.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux

@andrenogueiramartins
Copy link

Win 10x64, Python 3.6.5

@alexstrzelecki-gkn
Copy link

Win 7x64, Python 3.6.4

@BoboTiG
Copy link
Collaborator

BoboTiG commented Oct 29, 2018

If someone wants to help, we will be very greatful :)

@alonme
Copy link

alonme commented Dec 14, 2019

Does this happen in python 3.7 to anyone?
Please share more info regarding what exactly you are doing for this to happen.
What are you monitoring? what is the modification that you do

@dfitzpatrick
Copy link

I experience this in on_any_event) in a conditional block that checks if the instance is a FileModifiedEvent or a FileCreatedEvent

I open up the file I want to edit and type a word. When I hit save on the file it generates 4 events.

@choogiesaur
Copy link

choogiesaur commented Apr 15, 2020

Having this bug as well, I'm on Ubuntu.

I found that creating a file in the watched directory generates the following:
FileCreatedEvent followed by
DirModifiedEvent for parent directory
FileModifiedEvent for the newly created file (??)

And copying the same file to the directory while the file already exists generates anywhere from three to four FileModifiedEvents:

watcher_1      | 2020-04-15 22:34:42,797 — __main__ — INFO — Event: <FileModifiedEvent: src_path='/Eicon/input/US-RGB-8-epicard.dcm'>
watcher_1      | 2020-04-15 22:34:42,799 — __main__ — INFO — Event: <FileModifiedEvent: src_path='/Eicon/input/US-RGB-8-epicard.dcm'>
watcher_1      | 2020-04-15 22:34:42,799 — __main__ — INFO — Event: <FileModifiedEvent: src_path='/Eicon/input/US-RGB-8-epicard.dcm'>

@mmattklaus
Copy link

mmattklaus commented Aug 25, 2020

I experienced the exact same problem where the FileModifiedEvent executed twice.
I solve that by measuring the time between both triggers. My observation showed that both events executed in less than a 1-second difference (that is, time.time() - last_execution_time < 1).

So my walkaround was this:

import time

last_trigger = time.time()

def on_config_modified(event):
    global last_trigger
    if event.src_path.find('~') == -1 and (time.time() - last_trigger) > 1:
        config = reload_config()

.
.
.

From the code above,

  • I first registered the start time of the script (in a global variable)
  • Then, whenever the on_config_modified function is triggered (according to my watchdog implementation), I do two things:
  1. I check that the modified src_file name doesn't contain a tilt (~) sign (...event.src_path.find('~') == -1). This was the case for my YAML config file which I watched. It is because my config.yaml file first gets a temporal file (with the tilt sign), before it's finally modified.

  2. The second condition ((time.time() - last_trigger) > 1) is where the solution happens. I compare the difference in time since the function was last executed. If that difference in time is greater than 1, then I reload my configuration file (by calling the reload_config function).

I hope someone finds this useful.
Best regards.

@pmundt
Copy link

pmundt commented Dec 23, 2020

I had the same issue, and worked around it in two different ways:

  • Using a timer-based debouncing of general modifications (similar as above)
  • Stashing the path at creation time and checking to see if the modification is part of a create/modify pair notification, which inotify seems to raise for newly created files.

I'm not sure what other people are using it for, but I found the creation event triggering too fast for my needs, it was triggering at the point where the filesystem allocated a dentry, but well before the file contents had been written out, which was a bit counterintuitive.

    def on_created(self, event):
        self.plugins_pending.append(event.src_path)

    def on_modified(self, event):
       module_name = os.path.splitext(os.path.basename(event.src_path))[0]

        # Look for a completion to an earlier creation event
        for pending in self.plugins_pending:
            if pending == event.src_path:
                self.load_plugin(module_name, event.src_path)
                self.plugins_pending.remove(pending)
                return

The rest of on_modified implements the debouncing logic for the general modification case.

Hope this ends up saving someone some time!

@Shadoward
Copy link

Shadoward commented Mar 12, 2021

Hi all,

Sorry but the two workarounds do not work for me.

What I did find is that the FileCreatedEvent: event_type=created is several times depending of the number of subfolder you have.
For example I copy 5 subfolders and a file in it in the watch folder, watchdog will create 6 FileCreatedEvent: event_type=created and 2 FileModifiedEvent: event_type=modified.

The FileModifiedEvent: event_type=modified is always two times.

class Handler(watchdog.events.PatternMatchingEventHandler):
    def __init__(self, queue, args):
        # Access the extra argument(s) in the instance
        super().__init__()
        self.list_created = []
        self.list_modified = []

    def process(self, src_path):
        self.queue.put(src_path)
        self.list_modified = []
        self.list_created = []

    def on_created(self, event):
        self.list_created.append(event.src_path)
            
    def on_modified(self, event):
        if self.list_created and self.list_created[0] == event.src_path:  
            self.process(event.src_path)
            return     

To answer pmundt about creation event triggering too fast, I use the following code:

    # from https://stackoverflow.com/questions/6825994/check-if-a-file-is-open-in-python
    # Wait that the transfer of the file is finished before processing it
    while True:   # repeat until the try statement succeeds
        try:
            with open(path, 'rb') as f: #This is closing the file automatically for you.
                break                             # exit the loop
        except IOError:
            time.sleep(1)
            # restart the loop

@cdesilv1
Copy link

cdesilv1 commented Apr 27, 2021

I am using Python 3.6 on Ubuntu 16.04, and having similar issues. On overwriting a file, two file creation events are registered, both ~ 1/1000 second apart.

I have attempted to use both a class property queue and global variables inside the class to append events and read them, as well as counter and dict objects with datetime + src_path keys. While I print out events as they come in, which shows both events registered, when I print out the queue, most of the time it shows only a single item.

So far, it has shown two items in this queue a single time out of all attempts; the next run, without modifying the code, it goes back to a single item. I am also running this inside a docker container, Docker version 20.10.6, build 370c289. Docker image is running Ubuntu 16.04 as well.

Given the inconsistency, this feels like a low level problem.

@Shadoward
Copy link

@cdesilv1 , do you have try to do similar test with subfolder?
Create a number of nested subfolder with a file in the last nested subfolder and them copy the folder in the watch folder. You will see that the trigger on_created will appear the number of time related to the number of nested subfolder + 1.

If someone can put me in the right location in the main code. I can try to correct the problem..

@earonesty
Copy link

i think the root cause of a lot of bugs is that you'll get a FileModificationEvent when the access time is updated.

@edmondchuc
Copy link

Thanks @mmattklaus I adapted your solution and it works for me.

last_trigger_time = time.time()


class MyFileSystemEventHandler(FileSystemEventHandler):
    def on_modified(self, event):
        global last_trigger_time
        current_time = time.time()
        if event.src_path.find('~') == -1 and (current_time - last_trigger_time) > 1:
            last_trigger_time = current_time
            print('modified')

mattyonweb added a commit to mattyonweb/fiume that referenced this issue Jul 8, 2021
@jeabraham
Copy link

The first trigger is perhaps when the file is created, I found I couldn't read the file yet, as it wasn't complete yet. So, perhaps the second trigger is when the file is closed by the writing process, and available.

@RichieRogers
Copy link

RichieRogers commented Oct 21, 2022

Hi,
I'm getting this issue with the latest version of Python (3.10.8), Watchdog module (2.1.9) on Windows 10.
It is definitely the Last Access Time that is changing (possibly being checked by AV or maybe thumbnail caching by Windows).
However, the Last Modified Date is NOT changing - so why is a Modified event being generated (that is the only event I'm handling)?
I will have to look at some extra steps to check that the Last Access Time wasn't within the last x minutes, but that is a bit of a fudge and will potentially mean some files being excluded because someone genuinely accessed them.
Any ideas of cause and/or solution to this issue?

Thanks,
Richie

@M3ssman
Copy link

M3ssman commented Jan 5, 2023

Hello,

are there any news regarding this issue?

It even seems if a file gets copied via plain shutil.copy modifications to the copy are mirrored on the source file, which means that the event time is related to the access time of the copied file instead the source.

(Ubuntu 20.04, Python 3.8.10, watchdog 2.1.7)

@blakeNaccarato
Copy link

blakeNaccarato commented Feb 15, 2023

Here's another adaptation of the above cooldown workarounds for relatively slow directory monitoring (I've hard-coded a minimum cooldown of two seconds), with the boilerplate tucked into a context manager. In this implementation, your on_modified function doesn't need to explicitly handle cooldowns, rather that is tucked away in the context manager.

So the example usage looks like this...

Example usage

from pathlib import Path

from watchdog.events import FileSystemEvent

from implementation import DirWatcher  # Import from implementation file below

WATCH_DIR = Path("watch")
INTERVAL = 5


def main():
    with DirWatcher(WATCH_DIR, on_modified, INTERVAL) as watcher:
        watcher.run()


def on_modified(event: FileSystemEvent):
    """Do something with the event, without worrying about the cooldown."""


if __name__ == "__main__":
    main()

...which only allows for on_modified event handling in this simple implementation. The context manager will raise an exception if you specify an interval shorter than two seconds. You could spruce up the below implementation to optionally watch more events, but it mitigates the duplicate modification events in a rather crude fashion with the cooldown, which works for my use-case.

Note that the implementation uses Self type hint, a Python 3.11 feature, but you could omit that bit and it should work fine on earlier Python.

implementation.py

"""Context manager for basic directory watching.

Includes a workaround for <https://github.com/gorakhargosh/watchdog/issues/346>.
"""

from datetime import datetime, timedelta
from pathlib import Path
from time import sleep
from typing import Callable, Self

from watchdog.events import FileSystemEvent, FileSystemEventHandler
from watchdog.observers import Observer


class DirWatcher:
    """Run a function when a directory changes."""

    min_cooldown = 2

    def __init__(
        self,
        watch_dir: Path,
        on_modified: Callable[[FileSystemEvent], None],
        interval: int = 5,
        cooldown: int = 2,
    ):
        if interval < self.min_cooldown:
            raise ValueError(
                f"Interval of {interval} seconds is less than the minimum cooldown of"
                f" {self.min_cooldown} seconds."
            )
        if cooldown < self.min_cooldown:
            raise ValueError(
                f"Cooldown of {cooldown} seconds is less than the minimum cooldown of"
                f" {self.min_cooldown} seconds."
            )
        self.watch_dir = watch_dir
        self.on_modified = on_modified
        self.interval = interval
        self.cooldown = cooldown

    def __enter__(self) -> Self:
        self.observer = Observer()
        self.observer.schedule(
            ModifiedFileHandler(self.on_modified, self.cooldown), self.watch_dir
        )
        self.observer.start()
        return self

    def __exit__(self, exc_type: Exception | None, *_) -> bool:
        if exc_type and exc_type is KeyboardInterrupt:
            self.observer.stop()
            handled_exception = True
        elif exc_type:
            handled_exception = False
        else:
            handled_exception = True
        self.observer.join()
        return handled_exception

    def run(self):
        """Check for changes on an interval."""
        while True:
            sleep(self.interval)


class ModifiedFileHandler(FileSystemEventHandler):
    """Handle modified files."""

    def __init__(self, func: Callable[[FileSystemEvent], None], cooldown: int):
        self.func = func
        self.cooldown = timedelta(seconds=cooldown)
        self.triggered_time = datetime.min

    def on_modified(self, event: FileSystemEvent):
        if (datetime.now() - self.triggered_time) > self.cooldown:
            self.func(event)
            self.triggered_time = datetime.now()

@Sciumo
Copy link

Sciumo commented Oct 17, 2023

Access time update should be a separate event.
But this bug does not appear to be access time, as the OS returns the same modification and access times for each event.

@AndreaLanfranchi
Copy link

AndreaLanfranchi commented May 2, 2024

I've been scratching my head over this issue and after a bit of research I don't believe this is a wtachdog issue.
I must specify my observations are done on Windows hence on other OSes your mileage may vary.

First things first.
This snippet here specifies which filesystem events are collected:

WATCHDOG_FILE_NOTIFY_FLAGS = reduce(
    lambda x, y: x | y,
    [
        FILE_NOTIFY_CHANGE_FILE_NAME,
        FILE_NOTIFY_CHANGE_DIR_NAME,
        FILE_NOTIFY_CHANGE_ATTRIBUTES,
        FILE_NOTIFY_CHANGE_SIZE,
        FILE_NOTIFY_CHANGE_LAST_WRITE,
        FILE_NOTIFY_CHANGE_SECURITY,
        FILE_NOTIFY_CHANGE_LAST_ACCESS,
        FILE_NOTIFY_CHANGE_CREATION,
    ],
)

which means ANY of those modifications over the filesystem entry triggers an event.

Now: FS operations are in most cases complex.
For example a copy file operation will :

  • trigger the (new) file created event
  • several (new) file modified events on behalf of how many operations are needed to store: last write time, file attributes, security properties etc.

In addition on some file types, eg. Images, you might have modification events triggered as last_access is modified due to the fact applications (like file property dialog or the automatic generation of thumbnails) read header metadata (or the entire file) hence opening the file for read and eventually closing it.

I am not sure the "cooldown" strategy described above can properly solve the issue as there's no guarantee the flow of triggered events for the same file are contiguos: imagine a massive attribute "A" (archive) lowering in case of backups.

Bottom line ... IMO watchdog simply echoes correctly all FS events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests