You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This means that despite the login_* variables in the site-config, fetching full articles fails, as those two headers are missing.
I think this can be solved by
letting guzzle-site-authenticator pass headers on demand
making graby and/or wallabag pass the User-Agent override from the site-config if any
making graby and/or wallabag pass the Referer to be the original URL to be fetched
This should fix the ACM issue, and I think it is sufficiently generic to be equally helpful (or at least not detrimental) on other sites. If this turns out to break thing, we'd need additional site-config options to specify whether additional login_* headers should be included, and their value.
Now, this is all conjecture, as I haven't been able to successfully hack my wallabag instance to behave as described. I got lost jumping between wallabag, graby, and guzzle-site-authenticator.
I'm willing to keep going on this, but I would welcome pointers as to
how to see debug messages from the Authenticator about the requests they are sending (at the moment, I see graby and wallabag determining that a login is needed, and then failure from the login page, but no more debug in between)
how/where I could change/update the HttpClient that, I think, gets injected by wallabag or graby.
any other simpler way to achieve all this?
The text was updated successfully, but these errors were encountered:
When dealing with the ACM website (e.g., https://github.com/shtrom/ftr-site-config/blob/shtrom-s-master/cacm.acm.org.txt), the login URL only works if the HTTP Referer header is from an
acm.org
URL.In this particular instance, it sets a cookie, and serves a redirect to the original page.
For example, this works
But simply removing the `Referer` or `User-Agent` lead to failures:
This means that despite the
login_*
variables in the site-config, fetching full articles fails, as those two headers are missing.I think this can be solved by
User-Agent
override from the site-config if anyReferer
to be the original URL to be fetchedThis should fix the ACM issue, and I think it is sufficiently generic to be equally helpful (or at least not detrimental) on other sites. If this turns out to break thing, we'd need additional site-config options to specify whether additional
login_*
headers should be included, and their value.Now, this is all conjecture, as I haven't been able to successfully hack my wallabag instance to behave as described. I got lost jumping between wallabag, graby, and guzzle-site-authenticator.
I'm willing to keep going on this, but I would welcome pointers as to
guzzle-site-authenticator
(I unsuccessfully tried inLoginFormAuthenticator::login
https://github.com/wallabag/guzzle-site-authenticator/blob/master/lib/Authenticator/LoginFormAuthenticator.php#L36-L37 by adding aheaders
array, but maybe I did it wrong)HttpClient
that, I think, gets injected by wallabag or graby.The text was updated successfully, but these errors were encountered: