You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
has img tags which have empty src attribute. The src is set via javascript upon scroll I think or via noscript tags right after the img tags.
Here's a piece of the page's HTML:
<img alt="" class="iq ir t u v is ak c" width="687" height="60" role="presentation"><noscript><img alt="" class="t u v is ak" src="https://miro.medium.com/max/1374/1*JnixtUHJjNYXNT15P42eJQ.png" width="687" height="60" srcSet="https://miro.medium.com/max/552/1*JnixtUHJjNYXNT15P42eJQ.png 276w, https://miro.medium.com/max/1104/1*JnixtUHJjNYXNT15P42eJQ.png 552w, https://miro.medium.com/max/1280/1*JnixtUHJjNYXNT15P42eJQ.png 640w, https://miro.medium.com/max/1374/1*JnixtUHJjNYXNT15P42eJQ.png 687w" sizes="687px" role="presentation"/></noscript></div></div></div><figcaption class="jd je cm ck cl jf jg en b eo ep fv" data-selectable-paragraph="">SDLC components</figcaption></figure>
This causes Readability to return empty images for the large images and tiny thumbnails only when using ReadabilityExtended.
I am able to solve the issue by searching for all img tags with missing src and then checking if such Element has a noscript sibling with an img in it and if so, then extract the src from the noscript and set it to the original img:
I placed the following code at the very beginning of the protected open fun removeNoscripts(document: Document) {} function in Preprocessor.kt:
The following page:
https://netflixtechblog.com/full-cycle-developers-at-netflix-a08c31f83249
has
img
tags which have emptysrc
attribute. Thesrc
is set via javascript upon scroll I think or vianoscript
tags right after theimg
tags.Here's a piece of the page's HTML:
<img alt="" class="iq ir t u v is ak c" width="687" height="60" role="presentation"><noscript><img alt="" class="t u v is ak" src="https://miro.medium.com/max/1374/1*JnixtUHJjNYXNT15P42eJQ.png" width="687" height="60" srcSet="https://miro.medium.com/max/552/1*JnixtUHJjNYXNT15P42eJQ.png 276w, https://miro.medium.com/max/1104/1*JnixtUHJjNYXNT15P42eJQ.png 552w, https://miro.medium.com/max/1280/1*JnixtUHJjNYXNT15P42eJQ.png 640w, https://miro.medium.com/max/1374/1*JnixtUHJjNYXNT15P42eJQ.png 687w" sizes="687px" role="presentation"/></noscript></div></div></div><figcaption class="jd je cm ck cl jf jg en b eo ep fv" data-selectable-paragraph="">SDLC components</figcaption></figure>
This causes Readability to return empty images for the large images and tiny thumbnails only when using ReadabilityExtended.
I am able to solve the issue by searching for all
img
tags with missingsrc
and then checking if such Element has anoscript
sibling with animg
in it and if so, then extract thesrc
from thenoscript
and set it to the originalimg
:I placed the following code at the very beginning of the
protected open fun removeNoscripts(document: Document) {}
function inPreprocessor.kt
:The text was updated successfully, but these errors were encountered: