Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set modtime for metadata files #2307

Closed
lx30011 opened this issue Feb 17, 2022 · 12 comments
Closed

Set modtime for metadata files #2307

lx30011 opened this issue Feb 17, 2022 · 12 comments

Comments

@lx30011
Copy link
Contributor

lx30011 commented Feb 17, 2022

I'm downloading images from Reddit and have a few postprocessors setup. The downloaded images get their modtime updated, the metadatafiles (.data.json, .txt) don't. Is there any way to have them updated as well?
Thanks

My config

{
  "extractor":
  {
    "base-directory": "~/gallery-dl/",
    "postprocessors": [
      {
        "name": "metadata",
        "mode": "json",
        "filename": "{id}{num:? //>02} {title[:100]}.data.json"
      },
      {
        "name": "metadata",
        "mode": "custom",
        "content-format": "{title}\n",
        "filename": "{id}{num:? //>02} {title[:100]}.txt",
        "whitelist": ["reddit"]
      },
      {
        "name": "mtime"
      }
    ],
    "reddit": {
      "filename": "{id}{num:? //>02} {title[:100]}.{extension}"
    }
  }
}

Result
image

@AlttiRi
Copy link

AlttiRi commented Feb 18, 2022

I would like to set the same mtime as for the main file for the generated with postprocessors files too.

@God-damnit-all
Copy link
Contributor

God-damnit-all commented Feb 28, 2022

I can't seem to get the Last Modified Times of post files to work. The metadata post-processor with event:post has mtime set to true. mtime:true for the metadata post-processor does work on non-post metadata files, though.

Thinking 5974955 might be intended for that purpose, I tried adding an event:post mtime post-processor. It didn't work, not that I was expecting it to. I am curious what the use-case of this commit is supposed to be for, honestly.

@mikf
Copy link
Owner

mikf commented Feb 28, 2022

To get "mtime": true to work for "event": "post" metadata files, it needs an mtime post processor for the same event before any metadata post processors that should set a custom mtime.

"postprocessors": [
    {
        "name": "mtime",
        "event": "post"
    },
    {
        "name": "metadata",
        "event": "post",
        "mtime": true
    }
]

Otherwise there is no mtime date/time information available at that point in time, since "event": "post" runs before any file downloads happen and those are the ones with Last-Modified headers.

The post processor list from the first post should also be reordered accordingly. Otherwise metadata files and downloaded files will have different mtimes, still.

    "postprocessors": [
      {
        "name": "mtime"
      },
      {
        "name": "metadata",
        "mode": "json",
        "filename": "{id}{num:? //>02} {title[:100]}.data.json",
        "mtime": true
      },
      {
        "name": "metadata",
        "mode": "custom",
        "content-format": "{title}\n",
        "filename": "{id}{num:? //>02} {title[:100]}.txt",
        "whitelist": ["reddit"],
        "mtime": true
      }
    ],

@AlttiRi
Copy link

AlttiRi commented Mar 20, 2022

It don't applies on ugoira ffmpeg output files.
(However, it's not a "metadata" as per the topic request.)

@mikf
Copy link
Owner

mikf commented Mar 21, 2022

@AlttiRi 40ce505

@AlttiRi
Copy link

AlttiRi commented Mar 21, 2022

Ugoira files have the wrong mtime (it's UTC, instead of the local one).

Using of UTC time in filename format string is definitely the proper thing, since users from any time zone will have the same filename for the same downloaded URL.
However, using UTC in file's mtime have no sense.

Files from metadata postprocessors have the proper mtime.


Also it looks that the time is differ additionally by 1-3 minutes. It's strange.

@mikf
Copy link
Owner

mikf commented Mar 21, 2022

This difference might be due to different sources of the mtime values. Metadata files with event: post get their mtime from date (via mtime post processor), while ugoira animations get their mtime from Last-Modified headers.

ugoira post processors run at "event": "file", so you could add a second mtime post processor for that event, or let the first run for two events with "event": "post,file"

I should add that I more or less just copy-pasted the metadata mtime code for 40ce505. Both metadata.mtime and ugoira.mtime function in the exact same way.

@AlttiRi
Copy link

AlttiRi commented Mar 23, 2022

Wait.
It works without any mtime postprocessor fine.

I only add "mtime": true in my ugoira and metadata postprocessors.
And it works as I expect.

With any mtime postprocessor(s) I got wrong file mtimes.

The bug was because I used

      {
        "name": "mtime"
      },

@AlttiRi
Copy link

AlttiRi commented Mar 23, 2022

Without any mtime postprocessor(s):

For example, for +3:00 time zone:
https://www.pixiv.net/en/artworks/96989092 (Post time is "March 18, 2022 12:26 AM" (local) | "March 17, 2022 9:26 PM" (UTC))

The metadata and ugoira files have "2022.03.18 00:24" mtime — OK.
While "2022.03.17" string is in the filename (UTC representation) — OK.

For -10:00 zone:
https://www.pixiv.net/en/artworks/96530388 (February 25, 2022 7:43 PM (local) | February 26, 2022 5:43 AM (UTC))
"2022.02.26" filename string and "2022.02.25 19:43" mtime — also OK.

@AlttiRi
Copy link

AlttiRi commented Mar 23, 2022

However, I think even with

            {
                "name": "mtime",
                "event": "file,post"
            },

It should set the correct file's mtime.

For the 1st example above it should be "2022.03.18 00:26", but currently I get "2022.03.17 21:26" — wrong mtime. And "2022.03.17" in filename — OK.
For the 2nd the same thing is — mtime is UTC time, not the local.

mikf added a commit that referenced this issue Mar 24, 2022
'datetime.timestamp()', which got used to convert datetime objects to
POSIX timestamps, assumes naive datetimes represent LOCAL time, while
datetimes in 'date' metadata fields represent UTC time.

Ref: https://docs.python.org/3/library/datetime.html#datetime.datetime.timestamp
> Naive datetime instances are assumed to represent local time
> you can obtain the POSIX timestamp by … calculating the timestamp directly
@mikf
Copy link
Owner

mikf commented Mar 24, 2022

I looked into this a bit more and found a (major) bug.

The way the mtime post processor converted datetime information from date fields to POSIX timestamps was wrong. It used datetime.timestamp(), which assumes local time, while it was getting UTC time. (all date fields are supposed to be UTC)

This is now fixed with e7b3086, but all previous mtime timestamps produced from date fields are off by negative local UTC offset.

@AlttiRi
Copy link

AlttiRi commented Mar 26, 2022

Maybe it makes sense to add "mtime": true in ugoira for Pixiv by default? Since it's the default behaviour for downloaded files.
As I said above it does not even require to use any mtime postprocessors.

With mtime postprocessors mtime will be the same as it showed in a post's description on the site.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants