Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store screenshot of page in WARC, too #109

Open
machawk1 opened this issue Nov 19, 2018 · 2 comments
Open

Store screenshot of page in WARC, too #109

machawk1 opened this issue Nov 19, 2018 · 2 comments

Comments

@machawk1
Copy link
Owner

In https://kris-sigur.blogspot.com/2018/11/on-screenshots-in-warcs.html @kris-sigur describes the storage of a screenshot in a WARC file. This would be useful for others (e.g., @CamtheWicked on Twitter, for whom I could not find a GitHub handle) and might be easy(-er) to accomplish by leveraging the native Chrome APIs as available.

I have not worked with the devtools(?) API programmatically from an extension, but this seems like it would be a suitable use case for preservation using a browser extension.

/cc @N0taN3rd because I think he may have worked with this part of the Chrome/Web- extension API.

@N0taN3rd
Copy link
Collaborator

I believe there are two options

  1. using the tabCapture extension api (never played with this)
  2. using the debugger permission and CDP command Page.captureScreenshot

@machawk1
Copy link
Owner Author

machawk1 commented Nov 19, 2018

@N0taN3rd Thanks for the input!

tabCapture seems to be limited to the current viewport, excluding anything that is not currently visible. This would be useful but I think the anticipated "screenshot" concept expected by a user is for the whole page despite what's currently visible.

The second option might be more feasible but a little more complex. I think it will require chrome.debugger.getTargets(), identify the current tab (I am not yet sure what else qualifies as a target), chrome.debugger.sendCommand() using the target and Page.captureScreenshot as the method without any commandParams per https://developer.chrome.com/extensions/debugger#method-sendCommand (the defaults appear to be suitable).

EDIT: ...and of course, converting the base64-encoded image data to something more suitable for WARC record storage. It might be easiest to keep it as b64 in the WARC but I am unsure if there will issues with interpretation given it is not a resource of web origin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants