Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feed discovery does not work with relative URLs in links #1385

Closed
3 tasks done
mormegil-cz opened this issue May 28, 2021 · 7 comments
Closed
3 tasks done

Feed discovery does not work with relative URLs in links #1385

mormegil-cz opened this issue May 28, 2021 · 7 comments

Comments

@mormegil-cz
Copy link
Contributor

IMPORTANT

Read and tick the following checkbox after you have created the issue or place an x inside the brackets ;)

  • I have read the CONTRIBUTING.md and followed the provided tips
  • I accept that the issue will be closed without comment if I do not check here
  • I accept that the issue will be closed without comment if I do not fill out all items in the issue template.

Explain the Problem

When trying to add a blog to the News reader, I was unable to do so, News repeatedly claims the hostname was not found.

The (first) problem I found is that during the discovery phase, <link> element’s href attributes are used as written which does not work for relative URLs (allowed by the spec).

Steps to Reproduce

Explain what you did to encounter the issue

  1. Try to add a new feed: https://k47.cz/
  2. An error appears: cURL error 6: Could not resolve host: rss.xml (see https://curl.haxx.se/libcurl/c/libcurl-errors.html)

The problem is the k47.cz page links to its feed using a relative URL <link rel=alternate type=application/rss+xml href=rss.xml title="RSS zdroj"> which is then not resolved and News just attempts to fetch an “URL” of http://rss.xml.

It might be argued this is an upstream bug; feed-io’s Explorer might resolve the relative URIs itself. Hard to tell, there is no specification of its expected behavior, AFAICT.

I was able to fix the problem by resolving relative URLs after discovery:

Patch fixing the problem
--- FeedServiceV2.php.bak       2021-05-28 07:48:45.524385111 +0000
+++ FeedServiceV2.php   2021-05-28 07:58:19.287691101 +0000
@@ -16,6 +16,7 @@
 use FeedIo\Explorer;
 use FeedIo\Reader\ReadErrorException;
 use HTMLPurifier;
+use Net_URL2;

 use OCA\News\Db\FeedMapperV2;
 use OCA\News\Fetcher\FeedFetcher;
@@ -199,7 +200,13 @@
         if ($full_discover) {
             $feeds = $this->explorer->discover($feedUrl);
             if ($feeds !== []) {
-                $feedUrl = array_shift($feeds);
+                $discoveredUrl = array_shift($feeds);
+               $url2 = new Net_URL2($discoveredUrl);
+               if ($url2->isAbsolute()) {
+                       $feedUrl = $discoveredUrl;
+               } else {
+                       $feedUrl = strval((new Net_URL2($feedUrl))->resolve($discoveredUrl));
+               }
             }
         }

System Information

  • News app version: 15.4.5
  • Nextcloud version: 20.0.9
  • Cron type: Cron running on systemd timer
  • PHP version: 7.4.18
  • Database and version: mysql 10.5.10
  • Browser and version: Firefox 88.0
  • OS and version: Arch Linux/4.14.232
Contents of nextcloud/data/nextcloud.log
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":0,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"new parser added : FeedIo\\Standard\\Json","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":0,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"new parser added : FeedIo\\Standard\\Atom","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":0,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"new parser added : FeedIo\\Standard\\Rss","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":0,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"new parser added : FeedIo\\Standard\\Rdf","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":1,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"discover feeds from https://k47.cz/","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":0,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"read access : rss.xml into a feed instance (feed class : FeedIo\\Feed)","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":0,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"start reading rss.xml","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":1,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"no 'modifiedSince' parameter given, setting it to 01/01/1970","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":1,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"hitting rss.xml","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":2,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"rss.xml read error : cURL error 6: Could not resolve host: rss.xml (see https://curl.haxx.se/libcurl/c/libcurl-errors.html)","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
{"reqId":"kqjrWpgGULCAqRwS7NEa","level":0,"time":"2021-05-28T07:38:23+00:00","remoteAddr":"192.0.0.123","user":"Mormegil,"app":"news","method":"POST","url":"/apps/news/feeds","message":"cURL error 6: Could not resolve host: rss.xml (see https://curl.haxx.se/libcurl/c/libcurl-errors.html)","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0","version":"20.0.9.1"}
@SMillerDev
Copy link
Contributor

https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fk47.cz%2Frss.xml even the self reference is broken. I'd just recommend alerting author of this issue.

@mormegil-cz
Copy link
Contributor Author

Yes, that was the other problem I hit; I have already contacted the author about that. However, this issue is not caused by the broken self-link in the feed.

@stale
Copy link

stale bot commented Jul 21, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Jul 21, 2021
@mormegil-cz
Copy link
Contributor Author

What does it mean “no recent activity”? Should I keep commenting that yes, this is still broken?

@stale stale bot removed the stale label Aug 17, 2021
@SMillerDev
Copy link
Contributor

It means that nobody has time or motivation to do something about it. So at some point it'll be closed automatically unless someone fixes it before then.

@stale
Copy link

stale bot commented Jan 8, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Jan 8, 2022
mormegil-cz added a commit to mormegil-cz/news that referenced this issue Jan 12, 2022
When a feed is added using the feed discovery feature, and the feed link
uses a relative URL, the discovery needs to resolve the URL relative
to the provided website URL.

Fixes nextcloud#1385
mormegil-cz added a commit to mormegil-cz/news that referenced this issue Jan 12, 2022
When a feed is added using the feed discovery feature, and the feed link
uses a relative URL, the discovery needs to resolve the URL relative
to the provided website URL.

Fixes nextcloud#1385

Signed-off-by: Mormegil <[email protected]>
@stale stale bot closed this as completed Apr 16, 2022
@IgorA100
Copy link
Contributor

IgorA100 commented Nov 4, 2023

Solved here: alexdebril/feed-io#422

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants