Reduce memory usage #57

expectocode · 2016-12-03T17:41:59Z

Running on a machine with 1GB RAM is hellishly slow when processing a decent number of messages, would be a great help if you made it not load so much into memory :) Love the project, very useful for some of my own :D

tvdstaaij · 2016-12-18T21:57:17Z

I noticed this myself too, and I also know where it's coming from. When implementing #6 a certain design decision had to be made between three options varying from memory-saving but impractical to convenient but memory-intensive. I chose the middle ground, which buffers the message data of one dialog at a time in memory. So as it is now the number of dialogs shouldn't matter much, but the amount and size of the messages in the largest dialog determine the bulk of the memory load, and this gets quite a bit higher than I anticipated.

I'm not yet sure about the best way to tackle this problem, and I don't have much time for Github projects at the moment, but I would certainly like to solve this at some point. I am actually one of the users that would directly benefit from this, seeing how I run the script on a server with 2GB RAM of which at least a third is used by other software.

Related to #57.

tvdstaaij · 2016-12-18T22:54:31Z

I should also mention that, as far as I know, this is specifically a problem with formatters and not with the dumping process itself. If all formatters are disabled (so that only JSONL files will be produced) the memory usage should be acceptable.

However, as I wrote this and looked at some of my code I realized that I made a mistake in some conditionals that could cause high memory usage even if all formatters are disabled. The above commit on master fixes this.

ghost · 2016-12-18T23:00:08Z

Thanks, that's how I use it and hopefully this will help :) I really appreciate your responsiveness

…old to new [#57,#74] This is a necessary followup to 706776a and should also eliminate the excessive memory usage problem during formatting. Changes the progress file format, so a fresh backup is necessary after applying this commit. Minor regression: breaks reply author support (e.g. "in reply to Kenny") in plaintext formatter until an alternative method for achieving this is implemented.

tvdstaaij · 2017-03-04T14:23:22Z

I decided to rework the formatting system to use a less memory intensive method as a part of #74, which is developed on the dump-old-to-new branch and will probably be integrated in the next major release (because it requires a clean dump after upgrading). Currently the formatter memory problem is solved on this branch. It may increase a bit again if I re-implement reply formatting functionality but this should not be significant. Closing this in favor of #74.

anfederico · 2018-06-17T03:13:55Z

Say you've backed up a total of 1,000,000 messages and formatted all of them (plaintext one day per file). If you run the backup again and scrape 1000 more messages, are you reformatting everything again? Or just the 1000?

I've backed up around 2 years of data and backup messages daily. I've noticed the daily backups are quick (3-5 minutes for just a few 100-1000 messages), but the formatting takes hours. That's why I suspect it's reformatting everything?

tvdstaaij added a commit that referenced this issue Dec 18, 2016

Do not unnecessarily read JSON files when no formatters are enabled

bd29732

Related to #57.

tvdstaaij mentioned this issue Feb 27, 2017

JSON Dump line at the end of file #74

Open

tvdstaaij closed this as completed Mar 4, 2017

tvdstaaij mentioned this issue Jul 22, 2017

Crush poor system #86

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory usage #57

Reduce memory usage #57

expectocode commented Dec 3, 2016

tvdstaaij commented Dec 18, 2016

tvdstaaij commented Dec 18, 2016

ghost commented Dec 18, 2016

tvdstaaij commented Mar 4, 2017

anfederico commented Jun 17, 2018 •

edited

Loading

Reduce memory usage #57

Reduce memory usage #57

Comments

expectocode commented Dec 3, 2016

tvdstaaij commented Dec 18, 2016

tvdstaaij commented Dec 18, 2016

ghost commented Dec 18, 2016

tvdstaaij commented Mar 4, 2017

anfederico commented Jun 17, 2018 • edited Loading

anfederico commented Jun 17, 2018 •

edited

Loading