Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long attribute value truncation fires off requests to the truncated URLs #11465

Closed
Lofesa opened this issue Sep 19, 2020 · 9 comments · Fixed by #11503
Closed

Long attribute value truncation fires off requests to the truncated URLs #11465

Lofesa opened this issue Sep 19, 2020 · 9 comments · Fixed by #11503

Comments

@Lofesa
Copy link

Lofesa commented Sep 19, 2020

Summary
When I fecht my site with PSI I´m getting url´s in the server log like:

66.102.8.51 - - [19/Sep/2020:15:50:49 +0200] "GET /ensenanza/wp-content/uploads/2020/09/xbanner-h%E2%80%A6 HTTP/1.1" 404 28312 "https://intersindicalrm.org/ensenanza/" "Mozilla/5.0 (Linux; Android 7.0; Moto G (4)) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4143.7 Mobile Safari/537.36 Chrome-Lighthouse"

and in PSI

Captura

When the page is fechted with Search Console there is no error

@patrickhulce
Copy link
Collaborator

Thanks for filing @Lofesa I just ran into this the other day too. I'm not sure why Lighthouse is triggering a request to the truncated URL we create. We'll have to look into it.

@Lofesa
Copy link
Author

Lofesa commented Sep 21, 2020

Hi @patrickhulce
If it help, today I see for the 1st time 1 truncated url from apple bot, but only 1.
17.58.99.36 - - [21/Sep/2020:10:05:20 +0200] "GET /%E2%80%A6/sterm-convocara-asamblea-ext%E2%80%A6/ HTTP/1.1" 404 35939 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot)"

@brendankenny
Copy link
Member

When I instrument afterPass network requests, for this site (at least) it happens in both the ImageElements and TraceElements gatherers due to how we create the outerHTML snippet:

const clone = element.cloneNode();
ignoreAttrs.concat(autoFillIgnoreAttrs).forEach(attribute =>{
clone.removeAttribute(attribute);
});
let charCount = 0;
for (const attributeName of clone.getAttributeNames()) {
if (charCount > snippetCharacterLimit) {
clone.removeAttribute(attributeName);
} else {
let attributeValue = clone.getAttribute(attributeName);
if (attributeValue.length > ATTRIBUTE_CHAR_LIMIT) {
attributeValue = attributeValue.slice(0, ATTRIBUTE_CHAR_LIMIT - 1) + '…';
clone.setAttribute(attributeName, attributeValue);
}
charCount += attributeName.length + attributeValue.length;
}
}

The clone of an img element is still an img element, so when the src attribute is set to the truncated src.slice(0, ATTRIBUTE_CHAR_LIMIT - 1) + '…', the page immediately tries to load an image from the new (bad) url.

This request doesn't affect anything, it's just a cloned node that's not in the DOM and also isn't used by Lighthouse for anything but the snippet, but it's also kind of weird. I think we'd need to take a pretty different approach to creating the snippet (or never truncate src, which probably isn't workable) to fix it, though.

@connorjclark
Copy link
Collaborator

connorjclark commented Sep 21, 2020

cc @Beytoven I think this has come up before, does it seem familiar to you?

@paulirish
Copy link
Member

paulirish commented Sep 21, 2020

image elements have some atypical characteristics when it comes to loading. eg.

img = new Image();  
img instanceof HTMLImageElement // TRUE.. yes the Image() constructor is a full fledged html element
img.src = 'stuff' // immediately kicks off this network request, even tho this image isn't part of the DOM

(i think part of this is legacy web stuff)


anyhow, that aside, i think we can get around this loading behavior with the use of a <template> element.

const clone = element.cloneNode(); 

+ const temp = document.createElement('template')
+ temp.content.append(clone)

clone.setAttribute(.....

within that .content prop, all that content is inert so we can do whatever and no side-effects.

@patrickhulce patrickhulce changed the title PSI fecht bad url Long attribute value truncation fires off requests to the truncated URLs Sep 22, 2020
@bangel79
Copy link

bangel79 commented Oct 1, 2020

We have the same issue. We are running PSI every 15 minutes on some of our URLs and this results in multiple thousand malformed requests to our sub-resources with a truncated URI (%E2%80%A6). Our operations team is complaining massively because they are hindered to see real issues within our e-commerce site.

@csabapalfi
Copy link
Contributor

I submitted #11503 that fixes the issue.

@qadir5000
Copy link

I'm still seeing this in the logs when I run tests through https://developers.google.com/speed/pagespeed/insights/

image

This is what I get in the logs, URLs exactly cut at 74 characters (including the domain).

@connorjclark
Copy link
Collaborator

Page speed insights hasn't updated yet. No date for when it will.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants