Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tumblr] Add support for custom domains #71

Closed
Hrxn opened this issue Jan 14, 2018 · 11 comments
Closed

[tumblr] Add support for custom domains #71

Hrxn opened this issue Jan 14, 2018 · 11 comments

Comments

@Hrxn
Copy link
Contributor

Hrxn commented Jan 14, 2018

The first small fish mentioned.. 😄

You can use custom domain names for any blog on Tumblr, probably nothing more than the usual CNAME redirection.

As expected, gallery-dl has no pattern to match in such a case, thus "No suitable extractor".
I have picked two examples:
(1) http://tumblr.deadendthrills.com/
(2) http://www.b-authentique.com/

PS E:\Test> gallery-dl --verbose http://tumblr.deadendthrills.com/
[gallery-dl][debug] Version 1.1.2
[gallery-dl][debug] Python 3.4.4 - Windows-10-10.0.16299
[gallery-dl][debug] requests 2.18.4
[gallery-dl][debug] urllib3 1.22
[gallery-dl][debug] Starting DownloadJob for 'http://tumblr.deadendthrills.com/'
[gallery-dl][error] No suitable extractor found for 'http://tumblr.deadendthrills.com/'
PS E:\Test> gallery-dl --verbose "http://www.b-authentique.com/"
[gallery-dl][debug] Version 1.1.2
[gallery-dl][debug] Python 3.4.4 - Windows-10-10.0.16299
[gallery-dl][debug] requests 2.18.4
[gallery-dl][debug] urllib3 1.22
[gallery-dl][debug] Starting DownloadJob for 'http://www.b-authentique.com/'
[gallery-dl][error] No suitable extractor found for 'http://www.b-authentique.com/'
PS E:\Test>

(1) does not hide its true nature, looks pretty much like a typical Tumblr site, or to be more specific, like many of the popular themes for Tumblr. Including the the typical buttons/links for reblog, permalink, and the listed tags.
Not to mention the Tumblr buttons on the top right. And the tumblr in the URL is a good hint too, I guess.

(2) is a lot more interesting. No hints for Tumblr whatsoever. Instead it looks just like your average "webzine" site, or online magazine, or whatever it is called nowadays. The linked entries might give it away: {base-url}/post/{id}/{description}. And you can use the usual Tumblr navigation maneuvers, just append /tagged/a-tag to the URL in the browser, or better /archive.

On a side note, I personally would recommend both (1) and (2). Especially the first one, really great example of good original content. The other one is OC as well, and also, uh, interesting. In my opinion.


But, as can be seen by using the trusted Web Console again, it's actually possible to get the usual response just like any vanilla Tumblr blog. Including posts, etc.

1
Fig. 1
2
Fig. 2

Good News: I think this is something which should be rather easy to change:

url = "https://api.tumblr.com/v2/blog/{}.tumblr.com/{}".format(
blog, endpoint)

blog should not be restricted to {}.tumblr.com, basically. This should probably do the trick.

Bad News: From a UX perspective, this is a bit of a nightmare. Not sure how to solve this matter elegantly.

I can think of two options right now.

  • Add a new switch option, making it possible to force a specific extractor, something like --force tumblr maybe. But, to be fair, from all supported sites, only Tumblr would have this special "problem". And not just that, but as far as I know, there is no other site, platform, social network or creative network or whatever that allows such shenanigans. So, in effect, this flag could be named as just --tumblr. Until some other site comes up with similar genius features.
  • This may be the smarter variant: Add an additional check into gallery-dl, just before it errors out with "No suitable extractor", to intercept the program flow at this position. Then test this "failed" URL against the Tumblr API and control the result for the typical meta part and code 200 etc. and then proceed with the extraction.
@mikf
Copy link
Owner

mikf commented Jan 15, 2018

Option number two would overall lead to more elegant code, but the way I'd want to implement it would require some "back-end" changes and break some existing features, which is not necessarily bad but requires some time and thought.
Your first suggestion would be a lot easier to implement than the second one and, given Tumblr being the only site with this feature, is the one I'd prefer in this situation. Instead of an option/switch, I'd suggest to use some pseudo URL scheme like oauth:, recursive:, etc.:
tumblr:http://www.b-authentique.com/ or tumblr:www.b-authentique.com

@Hrxn
Copy link
Contributor Author

Hrxn commented Jan 15, 2018

That's perfectly fine with me, thanks.

As long as it does not break the workflow 😄
This seems not the case here, so nothing to worry about. I almost only use URLs en bloc in files (with --input-file), and prefixing every line with tumblr: is the easy part - as long as it still works with the traditional URL scheme, i.e. tumblr:http(s)://im-a-example.tumblr.com/ 👍

@mikf
Copy link
Owner

mikf commented Jan 15, 2018

Good, then I'm going to close this for the time being.

I might change this whole thing to what you described in your second option, we'll see. I will let you know if that should happen.

@mikf mikf closed this as completed Jan 15, 2018
@Hrxn
Copy link
Contributor Author

Hrxn commented Jan 16, 2018

Not really sure what is going on, but I get this:
Version: Git master

PS C:\Users\Hrxn> gallery-dl --verbose "tumblr:http://tumblr.deadendthrills.com"
[gallery-dl][debug] Version 1.1.2
[gallery-dl][debug] Python 3.6.4 - Windows-10-10.0.16299-SP0
[gallery-dl][debug] requests 2.18.4
[gallery-dl][debug] urllib3 1.22
[gallery-dl][debug] Starting DownloadJob for 'tumblr:http://tumblr.deadendthrills.com'
[tumblr][debug] Using TumblrUserExtractor for 'tumblr:http://tumblr.deadendthrills.com'
[tumblr][error] An unexpected error occurred: TypeError - quote_from_bytes() expected bytes. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
[tumblr][debug] Traceback
Traceback (most recent call last):
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\cache.py", line 177, in __call__
    result, _ = self.cache[key, timestamp]
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\cache.py", line 94, in __getitem__
    raise CacheInvalidError()
gallery_dl.cache.CacheInvalidError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\job.py", line 59, in run
    for msg in self.extractor:
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\extractor\tumblr.py", line 65, in items
    blog = self.api.info(self.blog)
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\cache.py", line 180, in __call__
    result = self.func(*args, **kwargs)
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\extractor\tumblr.py", line 244, in info
    return self._call(blog, "info", {})["blog"]
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\extractor\tumblr.py", line 274, in _call
    response = self.session.get(url, params=params).json()
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\util.py", line 489, in get
    return self.session.get(url + self.sign(url, params))
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\util.py", line 496, in sign
    key = self.concat(self.consumer_secret, self.token_secret).encode()
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\util.py", line 503, in concat
    return "&".join(OAuthSession.quote(item) for item in args)
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\util.py", line 503, in <genexpr>
    return "&".join(OAuthSession.quote(item) for item in args)
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\util.py", line 511, in quote
    return urllib.parse.quote(value, "~", encoding, errors)
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\urllib\parse.py", line 787, in quote
    return quote_from_bytes(string, safe)
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\urllib\parse.py", line 812, in quote_from_bytes
    raise TypeError("quote_from_bytes() expected bytes")
TypeError: quote_from_bytes() expected bytes
PS C:\Users\Hrxn>

Trying another URL format doesn't change anything. Error appears also for URLs like example.tumblr.com

@Hrxn
Copy link
Contributor Author

Hrxn commented Jan 16, 2018

Ah, False Alarm, I think. My bad.

Working with API Key and --ignore-config, apparently.

Something wrong with my current gallery-dl.conf.. need to investigate first.

@mikf
Copy link
Owner

mikf commented Jan 16, 2018

This happens if access-token or …-secret are set to a non-string value like a number, list or object/dictionary.

@Hrxn
Copy link
Contributor Author

Hrxn commented Jan 16, 2018

It's definitely caused by the gallery-dl.conf, but both access-token and access-token-secret are string values.

So that alone is not the reason itself.

self.session = util.OAuthSession(
extractor.session,
self.api_key, api_secret, token, token_secret)

OAuth makes use of all four key values, maybe there is some mismatch here? There's still the api-key set in gallery-dl.conf from the old version. By the way, that is basically what my question in #65 was about, I think.

@mikf
Copy link
Owner

mikf commented Jan 16, 2018

The solution was actually in the stack trace you posted all along, I just didn't take a proper look at it this morning ...

  File "…\gallery_dl\util.py", line 496, in sign
    key = self.concat(self.consumer_secret, self.token_secret).encode()

You set a custom value for your the API secret (consumer_secret), which isn't a string.

Even if it where a string, it wouldn't work anyway. Tumblr would complain about the oauth_signature value.

@mikf
Copy link
Owner

mikf commented Jan 16, 2018

$ gallery-dl test:tumblr:post -o api-secret=null
[tumblr][error] An unexpected error occurred: TypeError - quote_from_bytes() expected bytes.

$ gallery-dl test:tumblr:post -o api-secret=asd
[tumblr][error] {'meta': {'status': 401, 'msg': 'Unauthorized'}, 'response': [], 'errors': [{'title': 'Unauthorized', 'detail': 'Unable to authorize'}]}

@Hrxn
Copy link
Contributor Author

Hrxn commented Jan 16, 2018

Yes, you're right. I just saw it in my gallery-dl.conf as well now.

Here's what happened: I have a template file for my gallery-dl.conf where I make all changes first and set all the options, without including any user credentials or keys and tokens etc., so this can be shared and all..

It looks basically like this (The Tumblr part):

{
    "base-directory": null,
    [...]
    "extractor":
    {
        [...]
        "tumblr":
        {
            "user:
            {
            [...]
            },
            "access-token": null,
            "access-token-secret": null,
            "api-key": null,
            "api-secret": null,
            "posts": "all",
            "inline": true,
            "reblogs": true,
            "external": true
        },
        [...]
    },
    "output":
    [...]
}

I merge any changes back into my real gallery-dl.conf once in a while, and then update that file with my real credentials and keys etc.

So, after gallery-dl oauth:tumblr I got the tokens, and put them into access-token and access-token-secret, changed api-key to the value used before, but api-secret was still set to null, but that should not have been there, obviously.

But even with only api-key present, it wouldn't have worked, right? Because this would have been an authentication error (401: Unathorized), right?

BTW, let's continue this in #65 , because it's clearly an authentication thing and has nothing to do with custom domains.

@Hrxn
Copy link
Contributor Author

Hrxn commented Jun 4, 2018

PSA:

Because I've just stumbled upon this while testing something, and want to spare anyone of any potential hassle:

The first example URL above ((1) http://tumblr.deadendthrills.com/) does not work anymore, I assume the redirect was removed. The URL still exists though, although the sub-domain does not differ in any way from the normal domain now. Site got a slight redesign somewhat recently.

But the old Tumblr blog still exists, and can be found here:
http://officialdeadendthrills.tumblr.com/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants