Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue: Clones of sites do not show issues until hover #24

Open
devinat1 opened this issue Oct 5, 2024 · 1 comment
Open

Issue: Clones of sites do not show issues until hover #24

devinat1 opened this issue Oct 5, 2024 · 1 comment

Comments

@devinat1
Copy link

devinat1 commented Oct 5, 2024

I am getting the following issue with the crawler offline sites: https://www.loom.com/share/755b0efd840c48fc8f6f0be0114c6e8e
I can only view image to the article upon hover.

@janreges
Copy link
Owner

janreges commented Oct 6, 2024

Hi @devinat1,

on the BBC website there is some logic related to JavaScript support and displaying placeholders before images. Unfortunately, I don't have time to analyse all the JavaScript on the BBC website in detail to understand why this is happening.

In the BBC case, to display the images, it is necessary to remove the placeholder tag with hide-when-no-script class, that overlays the images.

In order to perform such replacements, I have implemented a new --replace-content option. Possible values are old -> new or /old-regex/ -> new. The only way to use this option is to run a version of the crawler from the current main branch of Git. If you are using macOS, the instructions are here: https://crawler.siteone.io/installation-and-requirements/manual-installation/#macos-x64-intel

Below is a realistic example with which the already cloned BBC website displays the images correctly.

./crawler \
  --url=https://www.bbc.com/ \
  --max-visited-urls=500 \
  --offline-export-dir=tmp/bbc.com \
  --replace-content='/<img[^>]+class="[^"]*hide-when-no-script[^"]*"[^>]*>/i -> '

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants