Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get unescaped html #4029

Closed
zivni opened this issue Aug 15, 2024 · 4 comments
Closed

Get unescaped html #4029

zivni opened this issue Aug 15, 2024 · 4 comments

Comments

@zivni
Copy link

zivni commented Aug 15, 2024

I need to get the html with hyperlinks not escaped.

consider this example:

const $ = cheerio.load(`<a href="https://www.google.com/maps/search/?api=1&query=NEW+YORK">NEW YORK</a>`, null, false);
console.log($.html()) 

What I get in the console is <a href="https://www.google.com/maps/search/?api=1&amp;query=NEW+YORK">NEW YORK</a> with &amp;. I need to get it with & like in the source.

I think that there is need for an option on the html function to not escape the result HTML

@nwalters512
Copy link

In case this helps, if you read the href attribute, you'll see that it's returned to you unescaped:

console.log($('a').attr('href'));
// https://www.google.com/maps/search/?api=1&query=NEW+YORK

Cheerio's behavior is consistent with how browsers work. You can prove this to yourself by executing this in a browser:

const el = document.createElement('div');
el.innerHTML = '<a href="https://www.google.com/maps/search/?api=1&query=NEW+YORK">NEW YORK</a>';
console.log(el.innerHTML);
// <a href="https://www.google.com/maps/search/?api=1&amp;query=NEW+YORK">NEW YORK</a>

@zivni
Copy link
Author

zivni commented Aug 19, 2024

Thank you for the explanation.

The use case I have is that I have html files that I need to manipulate and save back.

So I think that it makes sense that there will be another method that will export the html without escaping

@nwalters512
Copy link

I suspect that's outside the scope of this library. You might consider using a third-party library like he. Note that this will also decode any entities that exist in the starting HTML, which might not be what you want. That is, if you start with <div>&lt;</div>, you'll end up with <div><</div>. If it's possible for you to relax your requirements such that you can tolerate encoded entities in the output (which any browser, HTML parser, etc. would be perfectly happy with), that would probably work out best for you in the long run.

@zivni
Copy link
Author

zivni commented Aug 20, 2024

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants