Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partial backup #3

Closed
pishgaman-org opened this issue Dec 14, 2015 · 9 comments
Closed

partial backup #3

pishgaman-org opened this issue Dec 14, 2015 · 9 comments
Assignees

Comments

@pishgaman-org
Copy link

Hi and thanks for your script.
I want to backup my chats every day; but in this way it takes a lot of times, because I'm member of many groups.
can you add some ability to your script to backup only changes?

@tvdstaaij
Copy link
Owner

I did think about this, although I never thought of it as a high priority feature. But it is also clear that a naive "throw everything away and start over" approach won't work forever; my groups are growing rather large for this backup strategy as well.

I have some ideas for implementing an incremental backup, but right now I don't have a lot of spare time to work on it. I will keep this issue open and post something if I make some kind of progress.

@tvdstaaij tvdstaaij self-assigned this Dec 14, 2015
@giomasce
Copy link

I agree this would be very helpful. Thanks for the program!

@tvdstaaij
Copy link
Owner

After thinking about it for a while I decided on the following approach for incremental backups:

  • Incremental backups are enabled when the configuration flag track_progress is set to true.
  • The first backup is a full backup, after which an object with progress information is saved to <outputdir>/progress.json.
  • Subsequent backups load the progress file, dump only new messages, and update the progress file.
  • To keep things simple, old backup files will not be renamed when a dialog name has changed between backups. The progress file is fairly straightforward to edit however, so it would be possible to do this by hand if necessary.
  • It is strongly recommended to keep most settings intact while maintaining an incremental backup. A notable exception is backlog_limit: you could limit the initial backup and then set it back to unlimited for the partial backups (limiting partial backups could cause gaps!).

This feature is now available for beta testing. Either download the v2.0.0 beta1 release or do a fresh clone and git checkout v2.0.0-beta.1. Feedback and bug reports are appreciated and can be posted in this thread.

As an unrelated change in v2.0.0, the JSON5 configuration has been replaced with a YAML configuration, so please create a new configuration file based on config.yaml (and remember to set track_progress since it is disabled by default).

@pishgaman-org
Copy link
Author

Thanks man!
I'll try it.

@lgommans
Copy link

lgommans commented Feb 9, 2016

@tvdstaaij Dang, I just dumped my history, took 40 minutes, and now I read about the beta version with this feature! Oh well, I'm really glad this feature exists now, thanks for doing this :)

@tvdstaaij
Copy link
Owner

@lgommans Glad you find it useful, if you decide to switch to the beta I'd appreciate feedback on how it's working for you. Cheers!

I'm planning on releasing 2.0.0 in February or March, but before that I'd like to get some more confidence that everything is working as expected.

@lgommans
Copy link

@tvdstaaij I tried the dev version and enabled all downloading options this time. Took over six hours, downloading over 2GB of files (and Telegram stores all this for free, wow, I never realized!).

Running it again the next day worked fine. It did go past every chat individually, I don't know much about the Telegram API but I feel that could be done faster. Upon starting Telegram Desktop, it displays a list of chats (dialogs) with the last message pretty much instantly. No loading of individual last messages. This should probably be applied when checking whether there was an update, since most dialogs (about 80% for me) aren't regularly used and aren't likely to have a new message.

Other than the delay in fetching the dialogs without new messages, it seems to work great. I didn't test cases like renamed chats etc., just a chat with some new messages. (I used the json dumper.)

@tvdstaaij
Copy link
Owner

@lgommans Thanks for testing. I'll explain a bit about your observation that all dialogs are checked.

First, I'd like to note that this script doesn't directly use the Telegram API, it only utilizes the command line interface offered by telegram-cli.

One of my design decisions was to keep the script completely independent of internal telegram-cli state, because I believe this to be the least error prone (and I'm not even sure if there is a way to access internal telegram-cli state). This is achieved by only using the history command to get messages, which in turn means there is no way to know whether there are new messages without actually fetching at least one chunk of messages of each dialog.

Now the fetching could be done much faster in theory, but it's rate limited by default to one chunk per second (chunk_delay option), because some limitation or bug in the Telegram API, telegram-cli or both causes the whole backup process to hang when it's done any faster (vysheng/tg#717). You can confirm this for yourself: set chunk_delay to something like 0.1 or 0.01 and you'll see that after a couple of thousand messages the commands start timing out and messages are lost. Or maybe you're lucky and the limitation doesn't apply to you, heh.

To summarize, Telegram Desktop and co can do it quickly because they use the API directly and keep internal state, which is much more advanced than this script does.

@tvdstaaij tvdstaaij mentioned this issue Feb 22, 2016
@tvdstaaij
Copy link
Owner

Released v2.0.0 which includes this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants