-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread pools for faster builds? #225
Comments
Actually, looking at the flow, I'm not sure this is possible - it looks like jpt is called by Jekyll for each tag, so without multithreading Ruby, it wouldn't be possible to do a document in parallel - just each image. So, perhaps some speedups, but not as much as being able to throw the whole document worth of resizing into a thread pool. |
Thanks for the ideas! I was considering some sort of multithreading awhile back, but ultimately never put the time into making it work. You're right that Jekyll runs the show, multithreading tags would require some pretty big changes to Jekyll itself. That said, each picture tag generates multiple images, so we might be able to do that in parallel. Image generation is by far the most expensive operation JPT does; everything else is basically free by comparison. I don't have any experience at all writing multithreaded code, so if you're offering assistance in making this happen I'd certainly appreciate it. Note that we're also looking at moving from imagemagick to libvips, which should see a significant performance increase as well. |
I've messed around a little - and I am far from skilled in the Ruby. I'm a low level C guy by trade. However, I have figured out how to reasonably get threaded builds going, at least for each type of image. I expect one could probably expand this up higher, have threads for each image type and go, but this at least proves the concept a bit. In srcsets/basic.rb:
This will generate "All the webp files" in parallel, "All the jpgs in parallel," etc. However, it does not generate all the files for the image in general (webp and jpg are sequential). I'm not entirely sure what calls this for each image type. One might just create a global thread pool and toss the spaghetti at the wall, but my initial attempts at this (just eliminate the thread join) rather rapidly blew up task memory and I don't think it's a welcome enhancement for most people. I'm also far from certain the threads would actually complete prior to the render thread ending. Anyway, I don't know if this is something you're interested in pursuing further, but the proof of concept definitely indicates it should be doable. And if you can point me upstream to what calls each srcset generator, I could add some threading there, too. Rendering images certainly dominates my site build time. |
Cool :) Regarding the code, I believe what you've written would actually generate all of the widths for a particular srcset in parallel. Each srcset will have files of all the same format, the only difference will be their sizes. The image generation logic somewhat follows the output markup; the whole party is kicked off by instantiating the correct output format (class) and calling I like where you're going with this. I think we could move the thread pool up in scope, something like The one hitch is that |
Correct. It generates all the webp in parallel, then all the jpg in parallel, etc. Anyway, I really don't know Ruby well enough to do much more than what I've done, which at least helps my use cases for my renders (I generally don't rerender a ton, but some of my posts are photo heavy). If it gets done, it would be awesome, but doing global thread pools and such is well beyond my experience level with Ruby. |
Thanks for what you've figured out so far. We'll take another crack at it. |
Fixes rbuchberger#225 This implements a global Concurrent::ThreadPoolExecutor from concurrent-ruby to avoid memory blowup, and de-duplicates files before generation so we no longer need to rely on filesystem consistency to avoid double-generating images. This is an alternate to rbuchberger#282 with a litle bit more complexity, but with the added benefits that all image generation can happen in a single ThreadPool.
I really enjoy how jekyll_picture_tag works, but it's quite slow rendering a full site with a lot of images. I've noticed it's purely single threaded in operation, when the picture resizing could easily be done in parallel.
Has there been any consideration of using a thread pool or some other technique to allow for image conversions in parallel? It should speed rendering significantly on a multi-core machine.
I'm not all up to speed with Ruby development, but I could take a stab at it if nobody else has cycles to poke at this.
The text was updated successfully, but these errors were encountered: