Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML Dumper. #9

Closed
wants to merge 1 commit into from
Closed

HTML Dumper. #9

wants to merge 1 commit into from

Conversation

aidin36
Copy link

@aidin36 aidin36 commented Feb 19, 2016

I'm not a Ruby developer, so the code may not be very Rubish. (:

@tvdstaaij
Copy link
Owner

First of all, thank you for your contribution. The output styling is pretty neat, haven't tried it with photos yet though. The Ruby style looks fine to me, and in any case, part of the Ruby philosophy as I understand it is that a programmer should be free to use any style they please as long as it makes sense :)

I did find a few issues issues with your code:

  • When the dumper encounters a photo without a file field it crashes with an exception. This is the case when a photo cannot be retrieved from Telegram servers (happens sometimes) or when photo downloads are disabled. You should check whether msg['media']['file'] actually exists before attempting to add the photo.
  • The HTML entity escaping (.gsub(">", "&gt;").gsub("<", "&lt;")) is rather hackish and doesn't cover all mandatory HTML escapes. I think you should use a library call for this, a quick google shows that Ruby has an escapeHTML built in.
  • The output does not validate as HTML5. The issues I spotted are minor though; the encoding should specify a charset (text/html; charset=utf-8) and images should have an alt attribute.

Beside the above there are two more problems I have to get out of the way before I can merge this.They have to do with issues #3 and #6.

  • There have been some developments on the dev branch (intended for a v2.0.0 release) that broke compatibility with v1.0.0 dumpers. Support for incremental backups was added, which made dumpers much more complex than before. It was then decided to ditch custom dumpers completely and replace them with custom formatters. The main difference is that a dumper is purely streaming, while a formatter gets the messages in an array after they have already been dumped to an intermediate format (this way you get the advantages of incremental backup without the hassle). I didn't really expect someone to write a dumper in the meantime, but fortunately it should be pretty easy to refactor a v1.0.0 dumper to a v2.0.0 formatter. I could do it for you after merging if you change the PR target to the dev branch, or you could do it yourself if you prefer.
  • @lgommans has also been working on an HTML formatter (see Separate downloading data from exporting to formats #6). Assuming he still intends to create his own formatter (there are many possible variants of a HTML chat dump generator after all), I am left with a naming problem, since they couldn't just both be named 'html'. One possible solution would be to namespace contributed formatters, something like contrib/aidin36/html. I haven't thought of anything else yet, I would appreciate suggestions/opinions from both @aidin36 and @lgommans on this matter.

@lgommans
Copy link

Little time, but I can confirm I'm working on the HTML dumper.

Edit: oh wait, this is not a feature request but actually code. I'll read in more detail later (also tvdstaaij's comment).

@lgommans
Copy link

@tvdstaaij

I am left with a naming problem, since they couldn't just both be named 'html'.

They are incompatible, though. @aidin36's dumper is, well, a dumper. What I am writing is a formatter. They cannot coexist as it is, so once the dumper gets rewritten into a formatter the best of both can be taken.

You say there are many ways an HTML formatter/dumper can be written, but I'm not really sure. There are choices to be made like browser support, using javascript or not, language support (spoken languages I mean)... but those are mostly mergeable things and/or simply settings. Really big, incompatible differences can also be made clear in naming: basic-html vs. rich-html or interactive-html. As it stands, our scripts are way too similar for that, but that also means they can easily be merged.

Right now the advantage of @aidin36's dumper is RTL support, which can easily be added in my formatter (I planned to when seeing this dumper, but I think someone who actually uses RTL might be a much better judge of whether it works as it should).

@tvdstaaij
Copy link
Owner

The HTML formatter @lgommans wrote has just been merged into the dev branch. @aidin36, if you'd like to contribute RTL support or anything else that the formatter lacks, please modify this formatter and issue a PR for the modifications.

@tvdstaaij tvdstaaij closed this Feb 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants