Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WARC support #16

Open
charlesroelli opened this issue Nov 24, 2018 · 5 comments
Open

Add WARC support #16

charlesroelli opened this issue Nov 24, 2018 · 5 comments

Comments

@charlesroelli
Copy link
Owner

See https://lwn.net/Articles/766374/

@xvrdm
Copy link

xvrdm commented Apr 12, 2020

Hi and thanks for the awesome library!

I was wondering if you were aware of this initiative:
https://github.com/gildas-lormeau/SingleFile

It has a CLI, so I guess it could be used as a backend for org-board.

@charlesroelli
Copy link
Owner Author

Hi there, thank you for the link! I've not heard of SingleFile but it seems like a good fit for this package. I will look into adding support for it.

@c1-g
Copy link

c1-g commented Jun 26, 2022

Just throwing this out there. I manage to get org-board to work with another program called Monolith that, similar to Singlefile, saves a webpage in one html file. You can probably adapt this for the cli of Singlefile too.

Basically I override the org-board's org-board-wget-call to call my
own my/org-board-monolith-call instead.

(defun my/org-board-monolith-call (path directory args site)
  "Like `org-board-wget-call' but call monolith instead."
  (make-directory (file-name-as-directory directory))
  (let* ((filename (url-filename (url-generic-parse-url (car site))))
         (domain (file-name-nondirectory (url-domain (url-generic-parse-url (car site)))))
         (name (if (string-empty-p filename)
                   domain
                 (if (string-match "/$" filename)
                     (file-name-base (directory-file-name filename))
                   filename)))
         (output-directory-option
          (expand-file-name
           (concat (file-name-sans-extension (file-name-nondirectory name)) ".html")
           (file-name-as-directory directory)))
         (output-buffer-name "org-board-monolith-call")
         (process-arg-list (append (list "org-board-monolith-process"
                                         output-buffer-name
                                         path)
                                   org-board-wget-switches
                                   (list "-o")
                                   (list output-directory-option)
                                   args
                                   site))
         (monolith-process (apply 'start-process process-arg-list)))
    (if org-board-wget-show-buffer
        (with-output-to-temp-buffer output-buffer-name
          (set-process-sentinel
           monolith-process
           'org-board-wget-process-sentinel-function))
      (set-process-sentinel
       monolith-process
       'org-board-wget-process-sentinel-function))
    monolith-process))

(advice-add 'org-board-wget-call :override #'my/org-board-monolith-call)

Then I put these in my init.el

(setq org-board-wget-program (executable-find "monolith"))
(setq org-board-wget-switches '("-IevjF"))

The switches will be passed to monolith

@paudley
Copy link

paudley commented Jul 18, 2022

@c1-g That works beautifully! Thanks.

@fuzzbomb
Copy link

fuzzbomb commented Jan 9, 2023

GNU wget supports the creation of WARC archives, since 2012. See announcement at https://lists.gnu.org/archive/html/info-gnu/2012-08/msg00002.html

Given that org-board uses wget, can we get WARC support cheaply by using org-board's WGET_OPTIONS property?

I've just started using org-board (and org-attachments generally). WARC and WGET_OPTIONS is something I'm keen to try soon.

I'm skeptical about various other archive packages like SingleFile (which has already been forked...). I suppose it depends what you are looking for in a file format:

  • If you just a single file which can easily be copied or moved (shared as an email attachment, say) then take your pick: SingleFile and WARC both manage that.

  • If you're looking for web browser support, they're all poor choices IMO.

    • I'm unaware of any single-file archive format which is supported by common web browsers. Several browsers have devised their own format (e.g. MAFF) but none have caught on or been adopted by other browsers.
    • Some formats have 3rd-party browser extensions, which could be good for personal use. A downside here is that it doesn't really help when you want to share the archive with somebody else; they'll have to go and find a browser extension too.
  • If your interest is longevity though, then I'd bet on WARC. It's an ISO standard with a detailed spec, and it has the backing of major national libraries and universities. It's been developed and maintained with proper archivists and librarians, who tend to think on a longer time scale than most software developers I've known.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants