Page title fetching is still failing in some cases #531

julienCXX · 2016-04-05T18:00:44Z

Hi,
I recently installed Shaarli (v0.6.5) on a private server (running PHP 7.0.5), in order to save a large amount of links from opened browser tabs. As I imported links manually, I noticed that the link’s title did not always appear properly (nothing, only partially or breaking the UI). That happened quite often (about 90 broken titles on more than 520 links). As these issues seemed to have been fixed in #410 and #512, I tried to reiterate the process with a local instance (same PHP version), using the most recent development version at that time (11609d9). The result was a bit better (no UI breakage), but about 50 links are still broken, in different ways.
These are the following:

No title appears:

A title appears, but is truncated/not the right one:

A title appears, but non-ASCII characters are replaced with a question mark (encoding issue?):

Furthermore, title fetching fails on links to PDF files.

The text was updated successfully, but these errors were encountered:

alexisju · 2016-04-05T18:49:20Z

I tried the 5 first links. With Firefox API, title, url and description are fetched correctly. However, with bookmarklet (or Shaarli-next firefox addon), description are not loaded.

ArthurHoaro · 2016-04-05T19:49:21Z

Thanks for the dataset! It'll be easier to fix this for good.

So far, here's what I got:

30x HTTP redirection can contain a relative URL.
While some host requires request_fulluri attribute in the HTTP request, some others reject it.
It fails if the <title> HTML tag contain an attribute (eg. <title stuff="morestuff">.
Some host tells me that my request is HTTP 406 Not Acceptable, which is not very kind, because my browser is too old (Firefox 23 UA).
The apple one was tricky. They use simple quotes to define their <meta charset=, while we only expect double quotes or nothing, which leads to an invalid charset.

truncated/not the right one

http://expandedramblings.com : point 2.
amazon: the title is more likely changed in JS after page loading.
meetup: same as amazon.
https://wiki.cyanogenmod.org/w/Install_CM_for_jfltexx : Retrieved title is not escaped! That's very bad.
http://host1.no/ returns a 404 page (probably because I'm a bot), but with a 200 HTTP code. Don't use this host.

Encoding issues: mb_convert_encoding parameters are messed up.

Not solved yet and/or probably won't fix:

https://riunet.upv.es/ : it works if I remove the 4Mo download limit, but the page is 35ko. However, it takes me 5s to get a response from the host, and I can't even ping it.
https://fr.atlassian.com/git/tutorials/comparing-workflows/forking-workflow : It's weird, we have this in the same response:

[ 
  0 => 'HTTP/1.1 301 Moved Permanently',
  1 => 'HTTP/1.1 301 Moved Permanently',
  2 => 'HTTP/1.1 200 OK',
]

Probably due to going through reverse proxies, but that's more likely a bad configuration on their side.

https://static.societegenerale.fr/ : access is forbidden.
A few of your links seems to have an anti-bot policy (custom 401/403 error codes). We won't fight it.
http://mro.name/foaf.rdf doesn't have any title.

@alexisju: it doesn't work the same way. With bookmarklets, the page is already opened, and Shaarli grabs the title with JS (or Firefox API). In Shaarli, it has to make an external request from your server to the targeted URL.

julienCXX · 2016-04-06T00:22:42Z

OK, the explanation seems legit, but does not convince me completely.

amazon: the title is more likely changed in JS after page loading.

I do not agree with that. The correct title appears in a title tag, in the page’s source code. Have you looked at the URL encoding/escaping?
On another (desert) hand, this one does it.

meetup: same as amazon.

On this one, the retrieved title has an extra space between the “-” and the preceding word. It looks like the carriage return is interpreted as a space character. Other title differences could be related to an adaptation to the browser’s accepted language settings.

http://host1.no/ returns a 404 page (probably because I'm a bot), but with a 200 HTTP code. Don't use this host.

~~I had not got a 404 page. Strange.~~ Yes, trough my local instance.

https://static.societegenerale.fr/ : access is forbidden.

Yes, I know. I expected to get “403 Forbidden” as page title.

http://mro.name/foaf.rdf doesn't have any title.

This is not a valid HTML page, but my browser (Firefox) still displays a title (Marcus Rohrmoser). It looks like it was extracted from the foaf:name tag.

@alexisju: it doesn't work the same way. With bookmarklets, the page is already opened, and Shaarli grabs the title with JS (or Firefox API). In Shaarli, it has to make an external request from your server to the targeted URL.

I almost only used the Shaarli Web interface for adding links.

nodiscc · 2016-04-06T17:01:58Z

amazon: the title is more likely changed in JS after page loading.

I ran wget on that page and the <title> is correct (Amazon.fr : test), so not generated by JS.

I expected to get “403 Forbidden”

The message seems to depend on the User Agent (curl returns <title>Erreur / Error</title>, firefox <title>403 Forbidden</title>

This is not a valid HTML page, but my browser (Firefox) still displays a title (Marcus Rohrmoser). It looks like it was extracted from the foaf:name tag.

I think Shaarli can live without RDF parsing.

ArthurHoaro · 2016-04-06T17:53:41Z

amazon: the title is more likely changed in JS after page loading.

My bad, I jumped on that conclusion too fast. I found another issue, which will probably fix some other issues: the URL is escaped too early, so Shaarli's trying to reach a non valid URL (& => &).

On this one, the retrieved title has an extra space between the “-” and the preceding word. It looks like the carriage return is interpreted as a space character. Other title differences could be related to an adaptation to the browser’s accepted language settings.

Yep, I thought you were talking about the translation. The replace function adds an additional space to replace new lines, which is unnecessary. Also, I guess adding the locale in the HTTP request wouldn't harm.

Yes, I know. I expected to get “403 Forbidden” as page title.

No, Shaarli only downloads the page/retrieves the title if it finds 200 OK HTTP code. I don't really see the point to set 403 Forbidden as a title.

This is not a valid HTML page, but my browser (Firefox) still displays a title (Marcus Rohrmoser). It looks like it was extracted from the foaf:name tag.

Right, NoScript was blocking the rendering. I agree with @nodiscc though.

Anyway, there is a bunch of things to fix, but that will be a great improvement. Thanks!

ArthurHoaro · 2016-04-06T19:37:52Z

Actually I found something else for fr.atlassian.com: they use location instead of Location redirection directive (notice the case). It doesn't cost much to support it.

EDIT: Another. Bad certificate can be ignored.

see shaarli#531 for details

julienCXX · 2016-04-07T14:50:17Z

It looks like your commit fixed most problems, as I tested the problematic links with that version.
However, there are still issues with some of them (excluding the forbidden and won’t fix cases).
No title appears (maybe an anti-bot policy):

Title appears, but is not the expected one:

http://expandedramblings.com/index.php/3-reasons-self-hosted-video-better-youtube/ (it gives me no title, instead of “DMR” with the versions used in my first comment; this page also has 2 title HTML tags and my browser shows the other one “3 Reasons Self-Hosted Video Is Better Than YouTube”)
http://android.izzysoft.de/ (German title on Shaarli, English title in my browser)

host1.no has no 404 error anymore, title fetching works well here.

I found some new links causing other issues:

http://news.zing.vn/ (no title)
Unicode domains (e.g. http://www.académie-française.fr/ or http://ふそう.com/) no title appears, but non-unicode variant/converted versions work (respectively http://www.academie-francaise.fr/ and http://Xn--P8j2bxd.com/)
http://www.paruvendu.fr/bateau-nautisme/ : it looks like the encoding is broken (browser: “Nautisme : bateaux occasion, voiliers, jet ski... Vente bateaux à moteur, voiliers d’occasion”, Shaarli’s new link page: “Nautisme : bateaux occasion, voiliers, jet ski... Vente bateaux à moteur, voiliers d�occasion” and the invalid character does not appear when browsing links)
EDIT: http://www.ign.fr/ (no title and long to process)

ArthurHoaro · 2016-04-07T18:30:53Z

http://coderstats.net/ is working on my local instance.
http://antix.mepis.org/ returns 403 bad behaviour.
https://www.transip.eu/ returns 403 forbidden.
https://www.tilaa.com/ returns false.
http://services.gisgraphy.com/ returns 401 unauthorized.
http://expandedramblings.com/index.php/3-reasons-self-hosted-video-better-youtube/ it works here. :/
http://android.izzysoft.de/ I fixed the Accept-Language header (it sends exactly the same thing as my browser do), but still no luck. I bet my 2 cents on this, which might be a browser feature:

<link rel="alternate" hreflang="de" href="/articles.php?lang=de">

http://news.zing.vn/ If you try to wget it, you'll get the same crap as Shaarli does: a 40kB file full of unknown characters. No idea here.
Unicode domains are not supported by filter_var($url, FILTER_VALIDATE_URL)...
http://www.paruvendu.fr/bateau-nautisme/ They use the old charset definition in their HTML.
http://www.ign.fr/ also works.

So there are 2 fixable things here: unicode domains and supporting pre-HTML5 charset.

For the links that work for me, it might be because I have a better connection than yours. Although a proper server should have a better connection than mine.

julienCXX · 2016-04-07T19:21:43Z

For the links that work for me, it might be because I have a better connection than yours. Although a proper server should have a better connection than mine.

They now work for me, after another try.

http://news.zing.vn/ If you try to wget it, you'll get the same crap as Shaarli does: a 40kB file full of unknown characters. No idea here.

wget also gave me that. Using file, I discovered that the document is actually a gzip-compressed HTML page. After gunzipping it, the title appears clearly, surrounded by 2 carriage return characters (0x0D or ^M).

ArthurHoaro · 2016-04-08T17:16:17Z

Oh right, good catch. Although:

they shouldn't return compressed content if we didn't ask for it.
gzip is bundled with yet another PHP non standard extension.

julienCXX · 2016-04-16T18:10:02Z

Talking about anti-bot policies on some websites, how do you explain that they detect Shaarli as a bot given that it is using a desktop browser user agent? On another hand, most links flagged as having an anti-bot policy worked for me with wget (without setting the user agent), from my computer.

see shaarli#531 for details

ArthurHoaro · 2016-05-03T17:49:30Z

To be honest, I've no idea. Playing the headers didn't change anything. I'm not willing to spend more time on this. If anyone sees what Shaarli could do in a better way, feel free to reopen this issue.

see shaarli#531 for details

…turn Fixes #531 - Title retrieving is failing with multiple use case

julienCXX · 2016-07-15T17:29:44Z

Hello again, I have made discoveries about the links that are still broken.

First of all, I analysed (using Wireshark) the HTTP request emitted by Shaarli (on my local instance) when looking for the title and compared with wget on the same URL: http://antix.mepis.org/. wget worked whereas Shaarli did not (got the 403 bad behaviour).
The sent headers are different:

wget declared the request as HTTP/1.1, but Shaarli’s was HTTP/1.0 (and still specifying the host, not part of the spec)
wget’s header has Accept: */* whereas Shaarli does not have it
Shaarli’s header has Connection: close whereas wget does not have it
the user-agent is different.

After digging trough the code, it appears that the title fetching procedure is called from index.php and resides in get_http_response(), at application/HttpUtils.php. The function that actually sends the said headers is get_headers(), with options for altering them (I assume you already know that). Comparing the options with the network analysis, it appears that the method and user-agent are respected, but the accept-language parameter is missing. I also tried to add the Accept: */*, without success.

In order to do a more extensive testing, I extracted the code responsible for calling get_http_response() in a distinct file and added another method for connecting to websites, based on cURL and looking like that:

$url = 'http://an-url-to-test.tld/';

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,            $url);
curl_setopt($ch, CURLOPT_HEADER,         true);
curl_setopt($ch, CURLOPT_NOBODY,         true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT,        30);
curl_setopt($ch, CURLOPT_USERAGENT,      'Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0');

$r = curl_exec($ch);
return $r;

Running that on problematic URLs gave me those results:

http://antix.mepis.org/: Shaarli: HTTP/1.1 403 Bad Behavior, cURL: HTTP/1.1 301 Moved Permanently
https://www.tilaa.com/: Shaarli: false, cURL: HTTP/1.1 200 OK
http://services.gisgraphy.com/: Shaarli: HTTP/1.1 200 OK, cURL: HTTP/1.1 302 D�plac� Temporairement
https://www.transip.eu/: Shaarli: HTTP/1.1 403 Forbidden, cURL: HTTP/1.1 501 Not Implemented.

There are several things to say:

the requests made trough cURL all have HTTP/1.1 and Accept: */*
cURL appears to work better than get_headers(), as the sent headers might look like more “authentic”
on http://services.gisgraphy.com/, get_headers() missed the redirect
on https://www.transip.eu/, cURL failed, but using CURLOPT_NOBODY makes cURL use the HEAD method; setting back to GET, with curl_setopt($ch, CURLOPT_HTTPGET, true); fixes the issue.

As a conclusion, I would say that get_headers() itself may be the culprit, as its behaviour is too far from a web browser to pass trough some “anti-bot” filters. Thus I think it could be solved by changing the fetching function (with cURL, for instance). The only drawback that appears to me is that cURL may be disabled in server PHP configuration.

nodiscc · 2016-07-16T08:32:23Z

Thanks for the thorough investigation. I'm against adding a php5-curl dependency to workaround a few edge cases, but this could benefit from more investigation.

Note that curl --head https://www.transip.eu/ returns 501 not implemented. Whereas curl without --head returns a HTTP code 200. So this server is apparently actively blocking header requests. I'm not sure we should worry about this, which can be considered an unusual (or bogus) server configuration. The best course of action would be contacting the server's operator and asking them for more info.

ArthurHoaro · 2016-07-16T09:03:19Z

Thanks for the investigation. I've noticed in another project that cURL might be more reliable that default get_headers/file_get_contents function, even with context.

While we shouldn't add a server requirement for that (php-curl), we eventually could add a cURL fetch as default, with a get_header fallback function. It's what we've done with php-intl functions.

julienCXX · 2016-07-18T19:24:16Z

Thanks everyone for the feedback!

@nodiscc I contacted the server operator and was replied that HEAD requests are blocked as a protection against crawlers.

Talking about webpage fetching methods, I also considered using fsockopen(), but I am not sure about shared hosting support nor how to deal with HTTPS.

ArthurHoaro · 2016-07-19T11:23:42Z

fsockopen() for remote access requires allow_url_fopen to be set to true in php.ini. I don't know if it's more or less common than cURL. Also, I didn't run any test, but the function doesn't take any settings. stream_socket_client() allow to set up a context, but we might end up with the same issues we encountered with file_get_contents().

ArthurHoaro · 2016-08-03T08:10:05Z

@julienCXX Since you already worked on this, feel free to submit a PR with cURL fetching and the current code as fallback, if you want to.

julienCXX · 2016-08-03T20:22:06Z

I am thinking about it, but I have a few questions before working:

In application/HttpUtils.php, should I add the fetch method selection into get_http_response() or should I create a distinct function (method selection in calling code)?
There is a TODO comment inside the said function, stating that the calling code (thumbnailer) should catch exceptions from file_get_contents(). Thus, should I generate an exception the same way, in prevision of a code change?

ArthurHoaro · 2016-08-04T10:57:19Z

I think you should replace get_http_response() content, and move its current content in get_http_response_fallback() or something like that. Then you can call it conditionally using function_exists().

I'm not sure what's this TODO is for specifically, but there are todos application wide regarding errors, because we lack a proper error system. It'll be refactored eventually, don't worry about it.

Also, it's PHP. file_get_contents() and the vast majority of PHP functions won't raise exception, but will generate a warning or an error. Client code isn't wrapped in try/catch either, so just return array($headers, false); should be fine IMHO.

virtualtam · 2016-08-04T18:58:56Z

There are interesting discussions on how to catch PHP warnings/errors/fatals with a custom handler:

I think the TODO annotation in HttpUtils dates back to 451314e ; I've added these here and there to keep in mind we need to switch to a proper error handling system (i.e. instead of calling die() or exit()) so we can log them in a more friendly way (and avoid having to dig into server logs).

julienCXX · 2016-08-05T21:41:09Z

It’s almost there. The latest broken links seem to work, but I discovered other bugs:

Title encoding of http://www.paruvendu.fr/bateau-nautisme/ is broken again.
When a link has HTTP redirects, the URL shown in “Add link” page is the one prior any redirect (except adding “http://” at the beginning) instead of the effective one (after redirects).

ArthurHoaro · 2016-08-06T06:46:07Z

When a link has HTTP redirects, the URL shown in “Add link” page is the one prior any redirect (except adding “http://” at the beginning) instead of the effective one (after redirects).

That's the expected behaviour actually. We don't change user input (except for some parameters cleaning).

julienCXX · 2016-08-06T10:30:54Z

That's the expected behaviour actually. We don't change user input (except for some parameters cleaning).

OK, that’s not up to me.

Testing again with the cURL-based method, title in http://android.izzysoft.de/ now appears in English.
Talking about the weird response in https://fr.atlassian.com/git/tutorials/comparing-workflows/forking-workflow, as shown by @ArthurHoaro, it also happened to me with cURL (and any other link involving redirect). This was because cURL returns headers from all the redirects (not only the final page), resulting in a mixup. I assume it also happened with the fallback method.

ArthurHoaro · 2016-08-06T11:08:17Z

Title encoding of http://www.paruvendu.fr/bateau-nautisme/ is broken again.

Looks like something mb_convert_encoding isn't able to deal with because the à is converted, while the ´ is converted to a square.

title in http://android.izzysoft.de/ now appears in English

Neat. That will remain a mystery though.

https://fr.atlassian.com/git/tutorials/comparing-workflows/forking-workflow

Works fine here with your PR and the fallback method.

julienCXX · 2016-08-06T15:37:30Z

Neat. That will remain a mystery though.

This is not a mystery. It comes from the fact that cURL sends the Accept-Language header properly, whereas file_get_contents() does not.

Works fine here with your PR and the fallback method.

Yes I know. But I wanted to inform you that the issue was probably not related with a bad configuration on the webmaster’s side, as you stated in your first comment.

see shaarli#531 for details

… use case" This reverts commit 112fb2c.

skonsoftSASU · 2023-11-14T23:49:48Z

Hello, this still not fixed in 2023 !
I got same problem when trying to fetch https://tinast.fr.

Page title fetching still not work properly

Any updates ?

regards

nodiscc · 2023-11-15T02:08:46Z

Hi,

I got same problem when trying to fetch https://tinast.fr/.

Works properly on the demo instance https://demo.shaarli.org/admin/shaare?post=https%3A%2F%2Ftinast.fr%2F, and on my own instance.

Make sure that:

you are running the latest Shaarli release
all required PHP extensions are present (notably php-curl, or else Shaarli will fall back to the old get_file_contents() method which is not reliable Added (and set as default) a cURL-based method for fetching HTTP content #624)
your hosting provider does not block outgoing HTTP requests if you are using shared hosting.
more generally, that the server running Shaarli can reach the destination webserver (DNS, firewall, routing...)

If this still doesn't work, check your webserver/PHP logs for any possible errors (and provide them in a new issue), and/or enable dev.debug in Shaarli's configuration file.

nodiscc · 2023-11-15T02:16:08Z

There are, to my knowledge, only a few remaining cases where page title fetching is still broken:

Incorrect title detection on some pages (SVG title instead of HTML title) · Issue #1934 · shaarli/Shaarli (still not fixed, but rare)
Pages where the HTML <title> is set via javascript. This will not be fixed as it would require downloading the full page + external scripts and executing them (too complex/out of scope). I will update the Troubleshooting page to document this as a limitation of Shaarli. Using Browser extensions or the Bookmarklet works around this limitation, as they use the page title provided by the web browser. -> doc: troubleshooting: automatic title retrieval fails when it is set by javascript #2037
Issue with metadata retrieval #1756 (unconfirmed)
I will recheck all URLs linked in the first comment and report back

- related shaarli#531 - fixes shaarli#989

nodiscc · 2023-11-15T15:49:12Z

After rechecking all links in all comments of this issue, I was only able to reproduce the problem (no title retrieved) with these URLs on Shaarli v0.12.2. Each one may have a different cause, so we may have to open specific issues or update the documentation:

http://boilingsteam.com/ returns the HTTP 308 Permanent Redirect code to https://boilingsteam.com/. Bookmarking https://boilingsteam.com/ directly makes title retrieval work. Cause to be determined, but following HTTP 301 redirects to extract the title already works, so this may just be that we also need to follow HTTP 308?
http://www.paruvendu.fr/bateau-nautisme/ -> 308 Permanent Redirect to https://www.paruvendu.fr/bateau-nautisme/ -> bookmarking the redirect target directly works -> same case as above
http://pcsx2.net/. When queried by curl, this URL returns the cloudflare anti-DDoS/anti-bot page Attention Required! | Cloudflare. More debugging needed, but if a website is behind a service that blocks bots/automated access from Shaarli, I don't think there is much we can/should do to bypass this restriction. This should be documented.
http://www.pagesjaunes.fr/ HTTP 301 redirect to https://www.pagesjaunes.fr/, but bookmarking this URL directly does not work either. Querying the https:// URL directly via curl returns a page stating Enable JavaScript and cookies to continue, so this is another case of Some page titles are not retrieved properly #989 (javascript required, wontfix)
https://stackoverflow.com/jobs More debugging needed, curl --silent https://stackoverflow.com/jobs|grep title|head -n1 returns the proper <title>

Before reporting other cases, make sure your Shaarli installation is compliant with the requirements above (latest Shaarli release, PHP curl extension installed and enabled, outgoing HTTP requests not blocked by your hosting provider, destination webserver reachable form the server hosting Shaarli)

- ref. shaarli#2037 - ref. shaarli#531

puppe · 2024-01-21T22:16:38Z

Metadata retrieval does not work for YouTube videos, for example https://www.youtube.com/watch?v=e4TFD2PfVPw. The problem is that YouTube redirects (302) to https://www.youtube.com/supported_browsers?next_url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3De4TFD2PfVPw.

The reason for this redirection seems to be that Shaarli uses the user agent string from Firefox 45 which YouTube apparently deems to be too ancient. I can confirm that changing the version in the user agent string from 45.0 to something more recent like 121.0 (current release) or 115.0 (current extended support release) fixes the issue, at least for now. If you wish, I will submit a pull request to that effect.

nodiscc · 2024-01-22T18:09:56Z

changing the version in the user agent string from 45.0 to something more recent like 121.0 (current release) or 115.0 (current extended support release) fixes the issue

It seems you are correct

$ curl --silent --user-agent 'Mozilla/5.0 (Windows NT 10.0; rv:109.0) Gecko/20100101 Firefox/115.0' https://www.youtube.com/watch?v=e4TFD2PfVPw | grep --only-matching '<title>.*</title>'
<title>Parcels - Live Vol. 1 (Complete Footage) - YouTube</title>

$ curl --silent --user-agent 'Mozilla/5.0 (Windows NT 10.0; rv:109.0) Gecko/20100101 Firefox/45.0' https://www.youtube.com/watch?v=e4TFD2PfVPw | grep --only-matching '<title>.*</title>'
# nothing

If you wish, I will submit a pull request to that effect.

It would be nice, thank you.

Youtube used to be one of the few sites that used javascript to update the HTML <title> (it's even used as example in https://shaarli.readthedocs.io/en/master/Troubleshooting.html#page-title-and-description-are-not-retrieved-automatically), but it apparently no longer does (see other examples above).

YouTube responds with a redirect if it thinks that the user agent is too old. This commit changes the user agent string to that of a current version of Firefox. See also shaarli#531 (comment).

ArthurHoaro added the bug it's broken! label Apr 5, 2016

ArthurHoaro added this to the 0.7.0 milestone Apr 5, 2016

ArthurHoaro self-assigned this Apr 5, 2016

ArthurHoaro added the security label Apr 5, 2016

ArthurHoaro added a commit to ArthurHoaro/Shaarli that referenced this issue Apr 6, 2016

Fixes shaarli#531 - Title retrieving is failing with multiple use case

1e72653

see shaarli#531 for details

ArthurHoaro mentioned this issue Apr 6, 2016

Fixes #531 - Title retrieving is failing with multiple use case #532

Merged

ArthurHoaro added a commit to ArthurHoaro/Shaarli that referenced this issue May 3, 2016

Fixes shaarli#531 - Title retrieving is failing with multiple use case

d32cb07

see shaarli#531 for details

ArthurHoaro added a commit to ArthurHoaro/Shaarli that referenced this issue May 3, 2016

Fixes shaarli#531 - Title retrieving is failing with multiple use case

6e7db3b

see shaarli#531 for details

ArthurHoaro closed this as completed May 3, 2016

ArthurHoaro added a commit to ArthurHoaro/Shaarli that referenced this issue May 3, 2016

Fixes shaarli#531 - Title retrieving is failing with multiple use case

ce7b0b6

see shaarli#531 for details

ArthurHoaro added a commit that referenced this issue May 3, 2016

Merge pull request #532 from ArthurHoaro/hotfix/title-retrieve-the-re…

47be060

…turn Fixes #531 - Title retrieving is failing with multiple use case

nodiscc added enhancement and removed bug it's broken! security labels Jul 16, 2016

nodiscc added this to the backlog to the future milestone Jul 16, 2016

ArthurHoaro removed their assignment Aug 3, 2016

julienCXX mentioned this issue Aug 6, 2016

Added (and set as default) a cURL-based method for fetching HTTP content #624

Merged

portailp pushed a commit to PortailPro/Shaarli that referenced this issue Mar 20, 2017

Fixes shaarli#531 - Title retrieving is failing with multiple use case

112fb2c

see shaarli#531 for details

portailp added a commit to PortailPro/Shaarli that referenced this issue Mar 20, 2017

Revert "Fixes shaarli#531 - Title retrieving is failing with multiple…

fdaf96f

… use case" This reverts commit 112fb2c.

portailp added a commit to PortailPro/Shaarli that referenced this issue Mar 20, 2017

Revert "Fixes shaarli#531 - Title retrieving is failing with multiple…

cdc6fec

… use case" This reverts commit 112fb2c.

nodiscc self-assigned this Nov 15, 2023

nodiscc added a commit to nodiscc/Shaarli that referenced this issue Nov 15, 2023

doc: automatic title retrieval fails when it is set by javascript

bd3e71c

- related shaarli#531 - fixes shaarli#989

nodiscc mentioned this issue Nov 15, 2023

doc: troubleshooting: automatic title retrieval fails when it is set by javascript #2037

Merged

nodiscc added a commit to nodiscc/Shaarli that referenced this issue Jan 6, 2024

doc: troubleshooting: list more cases in which title retrieval fails

49236a6

- ref. shaarli#2037 - ref. shaarli#531

nodiscc mentioned this issue Jan 6, 2024

doc: troubleshooting: list more cases in which title retrieval fails #2060

Merged

nodiscc removed their assignment Jan 6, 2024

puppe mentioned this issue Jan 30, 2024

Fix metadata fetching for YouTube URLs #2069

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Page title fetching is still failing in some cases #531

Page title fetching is still failing in some cases #531

julienCXX commented Apr 5, 2016 •

edited by nodiscc

Loading

alexisju commented Apr 5, 2016

ArthurHoaro commented Apr 5, 2016

julienCXX commented Apr 6, 2016

nodiscc commented Apr 6, 2016 •

edited

Loading

ArthurHoaro commented Apr 6, 2016 •

edited

Loading

ArthurHoaro commented Apr 6, 2016

julienCXX commented Apr 7, 2016

ArthurHoaro commented Apr 7, 2016

julienCXX commented Apr 7, 2016

ArthurHoaro commented Apr 8, 2016

julienCXX commented Apr 16, 2016

ArthurHoaro commented May 3, 2016

julienCXX commented Jul 15, 2016

nodiscc commented Jul 16, 2016

ArthurHoaro commented Jul 16, 2016

julienCXX commented Jul 18, 2016

ArthurHoaro commented Jul 19, 2016

ArthurHoaro commented Aug 3, 2016

julienCXX commented Aug 3, 2016

ArthurHoaro commented Aug 4, 2016

virtualtam commented Aug 4, 2016 •

edited

Loading

julienCXX commented Aug 5, 2016 •

edited

Loading

ArthurHoaro commented Aug 6, 2016

julienCXX commented Aug 6, 2016

ArthurHoaro commented Aug 6, 2016

julienCXX commented Aug 6, 2016

skonsoftSASU commented Nov 14, 2023

nodiscc commented Nov 15, 2023 •

edited

Loading

nodiscc commented Nov 15, 2023 •

edited

Loading

nodiscc commented Nov 15, 2023 •

edited

Loading

puppe commented Jan 21, 2024

nodiscc commented Jan 22, 2024

Page title fetching is still failing in some cases #531

Page title fetching is still failing in some cases #531

Comments

julienCXX commented Apr 5, 2016 • edited by nodiscc Loading

alexisju commented Apr 5, 2016

ArthurHoaro commented Apr 5, 2016

julienCXX commented Apr 6, 2016

nodiscc commented Apr 6, 2016 • edited Loading

ArthurHoaro commented Apr 6, 2016 • edited Loading

ArthurHoaro commented Apr 6, 2016

julienCXX commented Apr 7, 2016

ArthurHoaro commented Apr 7, 2016

julienCXX commented Apr 7, 2016

ArthurHoaro commented Apr 8, 2016

julienCXX commented Apr 16, 2016

ArthurHoaro commented May 3, 2016

julienCXX commented Jul 15, 2016

nodiscc commented Jul 16, 2016

ArthurHoaro commented Jul 16, 2016

julienCXX commented Jul 18, 2016

ArthurHoaro commented Jul 19, 2016

ArthurHoaro commented Aug 3, 2016

julienCXX commented Aug 3, 2016

ArthurHoaro commented Aug 4, 2016

virtualtam commented Aug 4, 2016 • edited Loading

julienCXX commented Aug 5, 2016 • edited Loading

ArthurHoaro commented Aug 6, 2016

julienCXX commented Aug 6, 2016

ArthurHoaro commented Aug 6, 2016

julienCXX commented Aug 6, 2016

skonsoftSASU commented Nov 14, 2023

nodiscc commented Nov 15, 2023 • edited Loading

nodiscc commented Nov 15, 2023 • edited Loading

nodiscc commented Nov 15, 2023 • edited Loading

puppe commented Jan 21, 2024

nodiscc commented Jan 22, 2024

julienCXX commented Apr 5, 2016 •

edited by nodiscc

Loading

nodiscc commented Apr 6, 2016 •

edited

Loading

ArthurHoaro commented Apr 6, 2016 •

edited

Loading

virtualtam commented Aug 4, 2016 •

edited

Loading

julienCXX commented Aug 5, 2016 •

edited

Loading

nodiscc commented Nov 15, 2023 •

edited

Loading

nodiscc commented Nov 15, 2023 •

edited

Loading

nodiscc commented Nov 15, 2023 •

edited

Loading