(draft) Defer File closing to a background thread pool and cache dire… #66

jpdoyle · 2018-07-01T22:29:07Z

…ctory IDs

My current approach to #64

…ctory IDs

netheril96 · 2018-07-02T14:50:21Z

sources/file_table.cpp

+        if(close_it != m_files_to_close.end()) {
+            m_files.emplace(id,std::move(close_it->second));
+            m_files_to_close.erase(close_it);
+            it = m_files.find(id);


emplace returns the iterator to the created element. Another find is unnecessary time cost.

netheril96 · 2018-07-02T14:50:48Z

sources/file_table.cpp

+            }
+        } while(being_closed);
+    }
+


What is the purpose of this block of code?

This is to make sure that the file is not currently being deleted by one of the background threads -- I don't know the FileBase implementation well enough to know that it would be safe to have two FileBases referring to the same file open, with one actively being deleted by a background thread.

In pseudocode, it roughly does:

while(the file I'm trying to open is being closed) { wait for at least one background thread to finish (which would indicate a change to the list of files being closed) }

netheril96 · 2018-07-02T14:51:36Z

sources/file_table.cpp

+                        // The file was reopened
+                        return;
+                    }
+                    ptr = it->second.release();


The two lines below may throw exceptions, causing leakage of ptr. Instead of raw pointer, it should be a std::unique_ptr.

netheril96 · 2018-07-02T14:53:00Z

sources/thread_pool.hpp

@@ -0,0 +1,123 @@
+#ifndef THREAD_POOL_HPP


Do we really need several threads to do the background closing? Seems overkill to me.

5000 threads is definitely overkill, but I haven't tuned that number yet. In theory, running background threads might not be necessary at all, but I haven't dug into what deleting a FileBase exactly does -- it seems to be tied up in IO calls, but I'm not sure which ones (careful use of strace might be informative). With my folder I'm seeing a ~50x speedup -- I had timed ls before, but find -type f is more representative -- with this change it takes 2 minutes 11 seconds, and with vanilla securefs it took 1 hour 51 minutes. For reference, find -type f | wc -l says that this directory has 250289 files.

netheril96

Test on Travis CI fails. The error message is unclear (SIGABRT), but something is wrong.

…unique_ptr, use iterator from unordered_map::emplace

jpdoyle · 2018-07-03T03:11:51Z

I think the travis OSX failure was because the thread pool spawned 5000 threads. I reduced it to 50, which should be more reasonable.

netheril96

I believe I now understand the basics of what you are trying to do, and it should work. Let's iron out the details before merging it.

One big thing: I think the cache of path to ID resolution should be deferred to a future PR, if ever. As you have discovered, it barely improves performance. Any cache mechanism runs the risk of data out of sync, so cache should only be added when the benefit is clear and large.

I am also amazed that you are able to understand my codebase so easily as to modify it 👍🏻.

netheril96 · 2018-07-03T08:28:18Z

sources/thread_pool.hpp

+        condvars[i]->notify_all();
+    }
+
+    bool done() {


I have a hard time following what done() and wait() does, or what is the use of condvars. Besides, done() seems susceptible to race condition, as it accesses work_queues without acquiring the corresponding lock.

condvars lets the worker threads sleep without wasting any CPU time until there is work available for them to do -- condvars[i]->wait(l) pauses the thread until convars[i]->notify_all() is called on some other thread -- while the thread is asleep, the lock l is unlocked, but once it wakes up again it aquires l once more.

done() has a minor data race, but it's a read-only operation. Between jobs being added with add_job and finishing, done() will never true. If there are many jobs being added and finishing simultaneously, done might erroneously return true. wait waits for some job to complete, or for 100ms to pass, whichever happens first.

I should add these as comments :)

Race condition invokes undefined behavior. Nothing, not even reading, is safe during a race.

But more importantly, I don't think we should have done() and wait(). I guess that the intention is signal to the main threading calling open_as when to wake up and recheck if the ID to be opened is no longer being closed. But

Opening a ID that is just being closed is not a common scenario.

The signal is not accurate. It only signals that one ID has been closed, not which ID.

Each background thread has to call job_done.notify_all() after each closing. Not only does each call cost CPU cycles, it reduces parallelism since several threads may call it at the same time and have to wait for others.

In a nutshell, this design incurs a great cost, for a not-so-common scenario, and the main thread still has to recheck many times. Too much cost, too little gain.

A far simpler way is simply to let the main thread sleep for a short duration if the ID being opened is being closed by another thread.

Good catch on the undefined behavior. I've been doing a bit too much platform-locked stuff so I forgot the CPP generic memory model. Even platform-locked to x86 this code may sometimes fail due to memory reordering effects in empty() -- so the body of the if should be in a lock guard regardless.

Opening an ID that is just being closed is not a common scenario

Many directories in find -type f will fit this profile, with the current length of time deleting a FileBase takes.

The signal is not accurate. It only signals that one ID has been closed, not which ID.

This is correct, but the signal will always trigger when the correct ID is closed, so it's just the pull-based version of this communication protocol. To wait on a particular ID, we'd have to have the thread pool know a lot more about the jobs being given to it. That might be a good idea, but given how the data race in done() snuck in, I think adding code complexity to the thread pool is risky.

Each background thread has to call job_done.notify_all() after each closing. Not only does each call cost CPU cycles, it reduces parallelism since several threads may call it at the same time and have to wait for others.

Using something like condition variables will waste a lot less CPU time than periodic polling, and when an extended wait isn't necessary the condition variable will return quickly.

It is best if you could do a real world benchmark on which approach is better (and how many threads are appropriate). If you insist on current design, I won't object.

jpdoyle · 2018-07-03T09:50:22Z

I agree about the cache -- I'm gonna run a small benchmarking matrix using ls and find -type f to double-check how much of an effect the caching has, but I think my implementation is too half-baked to ship. A true solution would need to actively monitor the underlying folder with something like inotify, and even that isn't perfectly reliable.

netheril96

See my comment about done() and wait().

netheril96 · 2018-07-06T04:25:39Z

sources/thread_pool.hpp

+        if(!active_count.load()) {
+            bool anything_queued = false;
+            for(const auto& q: work_queues) {
+                if(!q.empty()) {


Even if we lock individual queues before accessing them, when we unlock them, the emptiness may change. Well, actually it cannot, because in our case, the thread calling done() is the same as the thread calling add_job, but such long distance coupling is not a good design.

netheril96 · 2018-07-06T04:26:45Z

sources/thread_pool.hpp

+            std::unique_lock<std::mutex> l(job_done_lock);
+            job_done.wait_for(l,std::chrono::milliseconds(100));
+        }
+        return done();


The return value is discarded by the caller. Does it serve any purpose?

netheril96 · 2018-07-06T04:29:39Z

sources/file_table.cpp

+    if(!free_pool.done()) {
+        bool being_closed = false;
+        do {
+            being_closed = false;


This is a no-op.

(draft) Defer File closing to a background thread pool and cache dire…

aaf7846

…ctory IDs

netheril96 reviewed Jul 2, 2018

View reviewed changes

Set thread pool to 50 threads, make deallocation exception-safe with …

177f84c

…unique_ptr, use iterator from unordered_map::emplace

netheril96 reviewed Jul 3, 2018

View reviewed changes

netheril96 reviewed Jul 5, 2018

View reviewed changes

netheril96 reviewed Jul 6, 2018

View reviewed changes

sources/file_table.cpp

if(!free_pool.done()) {

bool being_closed = false;

do {

being_closed = false;

Copy link

Owner

netheril96 Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a no-op.

jpdoyle closed this Mar 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(draft) Defer File closing to a background thread pool and cache dire… #66

(draft) Defer File closing to a background thread pool and cache dire… #66

jpdoyle commented Jul 1, 2018

netheril96 Jul 2, 2018

netheril96 Jul 2, 2018

jpdoyle Jul 2, 2018

netheril96 Jul 2, 2018

netheril96 Jul 2, 2018

jpdoyle Jul 2, 2018

netheril96 left a comment

jpdoyle commented Jul 3, 2018

netheril96 left a comment •

edited

Loading

netheril96 Jul 3, 2018

jpdoyle Jul 3, 2018

netheril96 Jul 5, 2018 •

edited

Loading

jpdoyle Jul 5, 2018

netheril96 Jul 6, 2018 •

edited

Loading

jpdoyle commented Jul 3, 2018

netheril96 left a comment

netheril96 Jul 6, 2018

netheril96 Jul 6, 2018

netheril96 Jul 6, 2018

(draft) Defer File closing to a background thread pool and cache dire… #66

(draft) Defer File closing to a background thread pool and cache dire… #66

Conversation

jpdoyle commented Jul 1, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netheril96 left a comment

Choose a reason for hiding this comment

jpdoyle commented Jul 3, 2018

netheril96 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netheril96 Jul 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netheril96 Jul 6, 2018 • edited Loading

Choose a reason for hiding this comment

jpdoyle commented Jul 3, 2018

netheril96 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netheril96 left a comment •

edited

Loading

netheril96 Jul 5, 2018 •

edited

Loading

netheril96 Jul 6, 2018 •

edited

Loading