-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fails to read info from specific websites #8
Comments
Thanks for reporting. I had a quick look: a stray My library only picks up the meta info that is actually located in the |
Using an opt-in is no problem, assuming that there are no negative side effects. And im not familiar with html5ever, so not sure how it relates. |
Okay, I can pick this up next week. Feel free to open a PR if you are in a hurry |
I've got little idea how this actually works, but maybe it would make sense to trigger searching in the body if no data is found in the head? (original reporter here) Unless the problem is multiple matching tags, which I guess might be what's going on with the charset on some of such pages for example; https://wyborcza.biz/biznes/7,147582,28355528,klasa-srednia-zaciska-pasa.html?squid_js=false has before we arrive at the actual so we end up with something like this on our page; (and that's one of the biggest news media in Poland, not some random small things, dont ask me why would anyone still use ISO charsets) |
Not sure if I should open another issue, but Lemmy doesn't seem to get images from root-relative URLs like this one: Here you can find such |
Thanks for bringing this to my attention @jorgesumle Leaving this issue open to address the original issue. That should be fixed on my part. |
According to reports from our users, webpage-rs is unable to parse info from https://oko.press/ website.
I was able to reproduce this by putting the site url into from_url test case.
Downstream issue: LemmyNet/lemmy#1796 (the other sites mentioned in the issue work fine in my tests).
The text was updated successfully, but these errors were encountered: