Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Dump line at the end of file #74

Open
bikhial opened this issue Feb 25, 2017 · 3 comments
Open

JSON Dump line at the end of file #74

bikhial opened this issue Feb 25, 2017 · 3 comments
Assignees

Comments

@bikhial
Copy link

bikhial commented Feb 25, 2017

Hi
Currently dumper dumps json line at the beginning of the file.
Is it possible to have new message line at the end of file?
I change some lines of dumper_prepender.rb file but i cant do it.

Thanks for your help

@tvdstaaij
Copy link
Owner

The thing is, the Telegram history downloading progress goes like this for every dialog (n is the chunk size):

  • Get n messages from offset 0 (these are the n newest messages)
  • Get n messages from offset n (these are the n newest messages after that)
  • Get n messages from offset 2n (etc...)

This is repeated until the oldest message in the dialog is reached (on the first dump) or until the most recent message from the last dump is reached (when doing an incremental dump).

This means that new-to-old is the "natural order" of the dump. As far as I know, there is no possibility to set an offset from the oldest message, only an offset from the newest message. It would be possible to work around this of course, for example by writing chunks to temporary files and concatenating them in reverse order at the end of the dialog. It might be a good idea to start doing this in a next major version (to any users reading this: opinions welcome). I'd say that from a processing point of view it makes more sense to have the data from old to new. It could also open up a way for solving the formatter memory issue (#57) seeing how all formatters currently reverse loop a RAM-buffered version of the message objects.

For now I would advise to process the JSON lines in reverse order (you can find reverse line reader code for many languages),

tvdstaaij added a commit that referenced this issue Mar 2, 2017
The dumper base class and JSON dumper are partly rewritten and the old
dumper directory tree removed.

Formatters have yet to be adapted to the new order, using them with this
commit is pointless.
@tvdstaaij
Copy link
Owner

The above commit is an implementation of the temporary file strategy I mentioned earlier. You can try it out if you want, note that this is on the branch dump-old-to-new and not on master. Usage notes:

  • This commit breaks formatters until I have a chance to rework them, but it should be fine if you just use the JSONL files.
  • If an output directory already exists from the master branch, delete it first before attempting to run a backup with this branch.

tvdstaaij added a commit that referenced this issue Mar 4, 2017
…old to new [#57,#74]

This is a necessary followup to 706776a and should also eliminate the excessive memory usage problem during formatting. Changes the progress file format, so a fresh backup is necessary after applying this commit.

 Minor regression: breaks reply author support (e.g. "in reply to Kenny") in plaintext formatter until an alternative method for achieving this is implemented.
@tvdstaaij
Copy link
Owner

Formatters should also work after the above commit, and again requires a fresh backup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants