-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory usage #57
Comments
I noticed this myself too, and I also know where it's coming from. When implementing #6 a certain design decision had to be made between three options varying from memory-saving but impractical to convenient but memory-intensive. I chose the middle ground, which buffers the message data of one dialog at a time in memory. So as it is now the number of dialogs shouldn't matter much, but the amount and size of the messages in the largest dialog determine the bulk of the memory load, and this gets quite a bit higher than I anticipated. I'm not yet sure about the best way to tackle this problem, and I don't have much time for Github projects at the moment, but I would certainly like to solve this at some point. I am actually one of the users that would directly benefit from this, seeing how I run the script on a server with 2GB RAM of which at least a third is used by other software. |
I should also mention that, as far as I know, this is specifically a problem with formatters and not with the dumping process itself. If all formatters are disabled (so that only JSONL files will be produced) the memory usage should be acceptable. However, as I wrote this and looked at some of my code I realized that I made a mistake in some conditionals that could cause high memory usage even if all formatters are disabled. The above commit on master fixes this. |
Thanks, that's how I use it and hopefully this will help :) I really appreciate your responsiveness |
…old to new [#57,#74] This is a necessary followup to 706776a and should also eliminate the excessive memory usage problem during formatting. Changes the progress file format, so a fresh backup is necessary after applying this commit. Minor regression: breaks reply author support (e.g. "in reply to Kenny") in plaintext formatter until an alternative method for achieving this is implemented.
I decided to rework the formatting system to use a less memory intensive method as a part of #74, which is developed on the |
Say you've backed up a total of 1,000,000 messages and formatted all of them (plaintext one day per file). If you run the backup again and scrape 1000 more messages, are you reformatting everything again? Or just the 1000? I've backed up around 2 years of data and backup messages daily. I've noticed the daily backups are quick (3-5 minutes for just a few 100-1000 messages), but the formatting takes hours. That's why I suspect it's reformatting everything? |
Running on a machine with 1GB RAM is hellishly slow when processing a decent number of messages, would be a great help if you made it not load so much into memory :) Love the project, very useful for some of my own :D
The text was updated successfully, but these errors were encountered: