Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements should be made to the thumbnails generated for NLA mementos #173

Closed
himarshaj opened this issue Mar 2, 2021 · 2 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@himarshaj
Copy link
Member

Test URI-Ms:

  1. https://webarchive.nla.gov.au/awa/20160210010019/http://pandora.nla.gov.au/pan/156727/20160210-1200/www.barefaced.com.au/index.html

  2. https://webarchive.nla.gov.au/awa/20160229080832/http://www.barefaced.com.au/

Thumbnails generated through MementoEmbed for the above NLA URI-Ms:

Thumbnail 1
image

Thumbnail 2
image

It appears that the screenshot to be used as the thumbnail is captured from the URI-M at Trove too soon before the page completely loads.

@shawnmjones shawnmjones added the bug Something isn't working label Mar 2, 2021
@shawnmjones
Copy link
Member

MementoEmbed's Python code calls a Puppeteer script as an external process to take the screenshot. To solve the problem you've documented here, I first thought that we could raise the THUMBNAIL_TIMEOUT value in this config file used during development:

THUMBNAIL_TIMEOUT = "300"

or this config file used in the Docker container:

# Number of seconds to wait for the thumbnail script to finish
# before sending an error message back to the user
THUMBNAIL_TIMEOUT = "300"

but the value is set to 300 seconds, which is 5 minutes. This is how long the Python code will wait for Puppeter before giving up.

Our Trove examples produce (poor) thumbnails much faster than 5 minutes, so that timeout is clearly not in play. Instead, I wonder if the screenshot code itself needs to be modified.

I don't remember what units page.waitFor will take. According to the Puppeteer source code for the Page class, the waitFor method can accept a selector, predictor or timeout to wait for. Maybe that would be helpful?

@himarshaj
Copy link
Member Author

Making adjustments to the "await page.waitFor(2000)" did not make a difference.

However, using page.waitForNavigation to control the time before the screenshot is taken improved the state of thumbnails, but it was not good enough.

await page.waitForNavigation({ waitUntil: 'networkidle0', });

When using "networkidle0", it waits for the network to be idle for 500ms. While there is no option to control the wait time/timeout there, we could emulate the puppeteer code to adjust the timeout. Tested with a timeout value of 2000ms instead of 500ms.

Test URI-M: https://webarchive.nla.gov.au/awa/20190305233656/http://www.barefaced.com.au/

image

The thumbnail generated looks much better. We can try making adjustments to this timeout value to set the most suitable value considering the trade-off between the quality of the thumbnail and the time to generate the thumbnail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants