Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't set retweets to download to a separate folder #1481

Closed
ExeArco opened this issue Apr 19, 2021 · 26 comments
Closed

Can't set retweets to download to a separate folder #1481

ExeArco opened this issue Apr 19, 2021 · 26 comments

Comments

@ExeArco
Copy link

ExeArco commented Apr 19, 2021

According to the following issues on github this feature should be available.
##1421
##1334

I am currently on version 1.17.2.

config.set(("extractor", "twitter"), "directory", ["twitter","{author[name]}","archive"])
config.set(("extractor", "twitter"), "retweets", ("directory", ["Twitter", "{user_likes}", "Likes"]))

I have tried a variety of things to set for "retweets" but nothing changes unless I set it to False.
Then it doesn't download retweets, as should be expected.
When I feed gallery-dl a url that is a users twitter page I want to download the users tweets in a base folder then their retweets in a separate folder within their folder.
For Example:

targetUser
targetUser/retweets

That is what I want.

@ExeArco ExeArco changed the title Cant' set retweets to download to a separate folder Can't set retweets to download to a separate folder Apr 19, 2021
@ExeArco
Copy link
Author

ExeArco commented Apr 29, 2021

I "fixed" this by downloading to a directory further in with retweets enabled, with a script that follows the downloaded images out if they match the target username.
Not an elegant solution, but I'm writing this here for anyone wondering the same thing.

@Scripter17
Copy link
Contributor

Scripter17 commented Jun 12, 2021

I thought it might've been possible to do this with "{'.' if retweet_id==0 else 'Retweets'}" but the function that handles this doesn't actually use fstrings, probably to avoid arbitrary code execution problems

Sorry to bug you, @mikf, but you got any ideas?

@rautamiekka
Copy link
Contributor

rautamiekka commented Jun 12, 2021

I thought it might've been possible to do this with "{'.' if retweet_id==0 else 'Retweets'}" but [https://github.com/mikf/gallery-dl/blob/d09bc5bd3462b75a784c8406c549e1c1858f9852/gallery_dl/util.py#L599) doesn't actually use fstrings, probably to avoid arbitrary code execution problems

Sorry to bug you, @mikf, but you got any ideas?

I suspect it's more cuz f-strings are executed in runtime (can't be stored, then executed later) while str.format() can use templates saved into a string like the {author[name]}.

Plus it's faster and way more powerful than the old printf-style percentage format still used by youtube-dl for compatibility, something that couldn't do something like {author[name]}. While the f-strings are even faster and more powerful, which really should be used when templates aren't needed.

@mikf
Copy link
Owner

mikf commented Jun 12, 2021

"{retweet_id:?//L0/Retweets/}" seems to work.
It produces an empty string when retweet_id is 0 and Retweets otherwise.

f-strings are Python 3.6+ only and I kind of want to keep Python 3.4 compatibility for gallery-dl v1.x.

@Scripter17
Copy link
Contributor

Scripter17 commented Jun 12, 2021

"{retweet_id:?//L0/Retweets/}" seems to work.

Care to explain how on earth that works and how you managed to come up with it? I am very confused

This feels like it should be documented somewhere

@mikf
Copy link
Owner

mikf commented Jun 12, 2021

?// returns an empty string when its input (retweet_id) evaluates to False (e.g. is 0) and otherwise returns its input as a string. It also stops any further processing.

L0/Retweets/ returns its input if its len() is <= 0 and Retweets otherwise.

So if retweet_id == 0 -> empty string through ?//, otherwise Retweets through L0/…/

This feels like it should be documented somewhere

I know, and it kind of is here.
The whole string formatting system needs to be redone with a proper parser and all that at some point, and I've been delaying writing any docs until that is done ...

@Hrxn
Copy link
Contributor

Hrxn commented Jun 27, 2021

config.set(("extractor", "twitter"), "retweets", ("directory", ["Twitter", "{user_likes}", "Likes"]))

I think this cannot possibly work, by the way.
Because you can only set the "retweets" option for Twitter to a boolean value (or to the "original" special value).

When I feed gallery-dl a url that is a users twitter page I want to download the users tweets in a base folder then their retweets in a separate folder within their folder.
For Example:

targetUser
targetUser/retweets

That is what I want.

The good news: This should be possible now.

Look here: https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractordirectory

The "directory" option can - analogous to the "filename" option - be set to an object containing Python expression mappings.
So it should be possible to do something like this (in the Twitter section of your config file):

"directory": {
    "retweet_id != 0"    : ["Twitter", "{user[name]}", "Retweets"],
    ""                   : ["Twitter", "{user[name]}"]
}

@ExeArco Could you please try this?

@ExeArco
Copy link
Author

ExeArco commented Jun 29, 2021

config.set(
	("extractor", "twitter"), 
	"directory", [
			{
				"retweet_id != 0" : ["{category}", folder,"archive", "retweets"],
				""	: ["{category}", folder,"archive"]
			}
		]
	)

This is what I attempted to use, however it gives me the following error:
[twitter][error] DirectoryFormatError: Applying directory format string failed (TypeError: expected str, got dict)

And just to confirm, I am on 1.18.0.

@Scripter17
Copy link
Contributor

It was implemented one commit after 1.18.0
You'll want to overwrite C:/Python39/Lib/site-packages/gallery_dl/util.py with this version of util.py

@ExeArco
Copy link
Author

ExeArco commented Jun 29, 2021

Alright I updated to 1.18.1-dev and I get this error instead
[twitter][error] DirectoryFormatError: Applying directory format string failed (TypeError: unhashable type: 'dict')

@mikf
Copy link
Owner

mikf commented Jun 29, 2021

Your directory value is a list with a dict as element. It should just be a dict:

config.set(
    ("extractor", "twitter"), 
    "directory", {
        "retweet_id != 0" : ["{category}", folder, "archive", "retweets"],
        "" : ["{category}", folder, "archive"],
    },
)

@ExeArco
Copy link
Author

ExeArco commented Jun 30, 2021

Alright well I fixed that but now it doesn't seem to be separating them at all, it seems that it doesn't ever trigger the retweet != 0 condition.
Here are all the config.sets related to twitter before I run that downloadJob.

config.set(("extractor", "twitter"), "filename", "{user[name]}_{tweet_id}_{date}_{num}.{extension}")
config.set(("extractor", "twitter"), "quoted", True)
config.set(("extractor", "twitter"), "text-tweets", True)
config.set(("extractor", "twitter"), "retweets", "original") 

config.set(
	("extractor","twitter"),
	'postprocessors', [
            {
                "name": "metadata",
                "event": "post",
                "filename": "{user[name]}_{tweet_id}_{date}_{num}.data.json"
            }
        ]
    )

config.set(
	("extractor", "twitter"), 
	"directory", {
				"retweet_id != 0" : ["{category}", folder,"archive", "retweets"],
				""	: ["{category}", folder,"archive"]
	},
)

@mikf
Copy link
Owner

mikf commented Jul 1, 2021

It is not really possible to differentiate between Tweets and Retweets when setting retweets to "original". It replaces each Retweet entry with its original Tweet, which has a retweeted_status_id_str/retweet_id value of 0.
Since you are using Python, you could change the condition before each new Twitter URL to check if the author['name'] field matches the expected username and determine if it's a Retweet like that. Or gallery-dl could update the retweet_id value in such cases, which shouldn't break anything else.

@ExeArco
Copy link
Author

ExeArco commented Jul 2, 2021

I'm using this to archive the user profile themselves, not passing through individual retweet/tweet/status links, so I can't check each URL itself.
It would be a much more elegant solution if I could just do it all in one go.

@Hrxn
Copy link
Contributor

Hrxn commented Jul 2, 2021

What's the specific reason for not using extractor.twitter.retweets = true?

@ExeArco
Copy link
Author

ExeArco commented Jul 2, 2021

If I set retweets = true it sets the username of the retweet to the target of my download, not the original poster of what is being retweeted.
EX:
I want to archive posts done by A.
If I set retweets to true, my sub folder with all the retweets will be full of content from B,C,D,E,F,G however they will all have the file name set as if A actually tweeted them.

@Scripter17
Copy link
Contributor

You can use retweets: true and replace the {user[name]} in filename with {author[name]}

@Hrxn
Copy link
Contributor

Hrxn commented Jul 3, 2021

Yes, that is exactly what I am using here.

@ExeArco
Copy link
Author

ExeArco commented Jul 3, 2021

If I do as Scripter17 has suggested it works however it now doesn't sort out quoted retweets., and looking through the metadata json that I have, it appears there is not anything I can use to sort that out, unless I am missing anything.

@mikf
Copy link
Owner

mikf commented Jul 3, 2021

quoted retweets

Do you mean regular quote tweets or quotes of a retweet?
I don't think I've ever seen the latter, but maybe that exists as well.

it appears there is not anything I can use to sort that out

Quoted tweets have a non-zero quote_id, so it'd be something like

"directory": {
    "retweet_id": ["{category}", folder, "archive", "retweets"],
    "quote_id"  : ["{category}", folder, "archive", "quotes"],
    ""	        : ["{category}", folder, "archive"],
}

I've also updated the behavior of "retweets"; "original" to have a non-zero value for retweet_id (414bdc9)

@ExeArco
Copy link
Author

ExeArco commented Jul 4, 2021

I'll be honest I'm not sure exactly what this is called either, but take this example.
https://twitter.com/BarackObama
This is the link I send to download, later down Obama quotes a NetflixFilm account(https://twitter.com/BarackObama/status/1408818108224131074), and I would like that put in retweets.
Looking through the metadata, it appears quote_id, reply_id, and retweet_id are all 0 so I am not sure how to separate this one out.

@ExeArco
Copy link
Author

ExeArco commented Aug 15, 2021

I am still looking for a fix to this, does anyone have one?
I have just updated to the latest version and there still seems to be no way to properly separate those tweets out.

@Hrxn
Copy link
Contributor

Hrxn commented Aug 15, 2021

[..] it appears quote_id, reply_id, and retweet_id are all 0 so I am not sure how to separate this one out.

Not sure, but if this is what gets returned by Twitter..

PS D:\> gallery-dl --ignore-config -K 'https://twitter.com/BarackObama/status/1408818108224131074' --option '"quoted"=true' | sls "tweet|quote|reply" -NoEmphasis -Context 0,1

> quote_count
    439
> quote_id
    0
> reply_count
    268
> reply_id
    0
> retweet_count
    792
> retweet_id
    0
>   tweet
> tweet_id
    1408510014847959053
> quote_count
    439
> quote_id
    0
> reply_count
    268
> reply_id
    0
> retweet_count
    792
> retweet_id
    0
>   tweet
> tweet_id
    1408510014847959053

PS D:\>

This tweet seems a bit strange.. If I don't use -o '"quoted"=true' gallery-dl skips this tweet with default settings ("skipping quoted tweet")..

So, something is a bit off here, I guess. I blame Twitter.

@ExeArco
Copy link
Author

ExeArco commented Aug 19, 2021

I mean it is a quoted tweet right?
So then why is quote_id 0?
I really would like to separate these out through gallery-dl

@mikf
Copy link
Owner

mikf commented Aug 23, 2021

As it turns out, quote_id is the ID of the quoted Tweet and is only nonzero for Tweets that contain a quote, not for the Tweet being quoted. It is the exact opposite of how retweets behave and I have no idea why I implemented it like that back when.

Instead of checking for quote_id, you could compare user and author and classify it as a quote when both are different.

"directory": {
    "retweet_id"    : ["{category}", folder, "archive", "retweets"],
    "user != author": ["{category}", folder, "archive", "quotes"],
    ""	            : ["{category}", folder, "archive"],
}

mikf added a commit that referenced this issue Sep 27, 2021
Only present for tweets quoted by another tweet.
Represents the tweet_id of said tweet quoting this one.
@afterdelight
Copy link

use "locals().get('quote_by')"

@mikf mikf closed this as completed Dec 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants