Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

«File name too long» should not error #29912

Open
3 tasks done
ffaf1 opened this issue Sep 6, 2021 · 33 comments · May be fixed by #29989
Open
3 tasks done

«File name too long» should not error #29912

ffaf1 opened this issue Sep 6, 2021 · 33 comments · May be fixed by #29989
Labels

Comments

@ffaf1
Copy link

ffaf1 commented Sep 6, 2021

Checklist

  • I'm reporting a feature request
  • I've verified that I'm running youtube-dl version 2021.06.06
  • I've searched the bugtracker for similar feature requests including closed ones

Description

To reproduce:

  • try to download any video with a really long title
  • you get ERROR: unable to open for writing: [Errno 36] File name too long:
  • the video is not downloaded

My suggestion:

This will improve UX, check bug tracker to see how common this issue is.

#7765 talks about this too (with a linked pull request) but:

  • does not elaborate on whether it is actually a good thing / possible drawbacks
  • is limited to collaborators
@ffaf1 ffaf1 added the request label Sep 6, 2021
@dirkf
Copy link
Contributor

dirkf commented Sep 7, 2021

The general solution to this is a (new) FAQ entry

### My download fails because youtube-dl is trying to write to a file whose name is too long for the file system.

Use the `--get-filename` option to see what filename is being proposed. Then adjust, or add, the `-o`/`--output` option in your original youtube-dl command to make the filename shorter. For instance, if the extractor for the site sets a very long `title` field and you are using the default output template `%(title)s-%(id)s.%(ext)s`. you could try `-o %(title).50s-%(id)s.%(ext)s` instead to limit the length of the filename.

@ffaf1
Copy link
Author

ffaf1 commented Sep 7, 2021 via email

@dirkf
Copy link
Contributor

dirkf commented Sep 7, 2021

The previous discussions on this point show that the maintainers (Sergey, at least) won't accept any solution that they see as ad hoc. The project sees the id field in particular as inviolable, and won't accept a solution that might accidentally truncate an id embedded in a filename. However, if the project's maintainers should be permanently absent, the project might be adopted by maintainers with different views.

It could have been reasonable to add some sort of filename truncation functionality to the --restrict-filenames option, where it has already been accepted to modify embedded fields, but the existing definition of that option doesn't allow it.

The cross-platform Python APIs don't offer a way of telling if the proposed file name is too long. In general this is a tricky issue because the error is only found once the OS kernel has called into the filesystem driver for the device that is to be written to.

We could try modifying sanitize_open() in utils.py, s/t like below:

def sanitize_open(filename, open_mode):
    """Try to open the given filename, and slightly tweak it if this fails.

    Attempts to open the given filename. If this fails, it tries to change
    the filename slightly, step by step, until it's either able to open it
    or it fails and raises a final exception, like the standard open()
    function.

    It returns the tuple (stream, definitive_file_name).
    """
    try:
        if filename == '-':
            if sys.platform == 'win32':
                import msvcrt
                msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
            return (sys.stdout.buffer if hasattr(sys.stdout, 'buffer') else sys.stdout, filename)
        stream = open(encodeFilename(filename), open_mode)
        return (stream, filename)
    except (IOError, OSError) as err:
        if err.errno in (errno.EACCES,):
            raise
        # --- addition ---
        # would go in `compat.py`
        try:
            from textwrap import shorten as compat_textwrap_shorten
        except ImportError:
            def compat_textwrap_shorten(
                    text, width, fix_sentence_endings=False, break_long_words=True,
                    break_on_hyphens=True, placeholder=' [...]'):
                import textwrap
                try_text = textwrap.wrap(
                    text, width,
                    fix_sentence_endings=fix_sentence_endings,
                    break_long_words=break_long_words,
                    break_on_hyphens=break_on_hyphens)
                if len(try_text) == 1:
                    return try_text[0]
                return textwrap.wrap(
                    text, width - len(placeholder),
                    fix_sentence_endings=fix_sentence_endings,
                    break_long_words=break_long_words,
                    break_on_hyphens=break_on_hyphens)[0]+placeholder

        # function would go in utils.py
        def reduce_filename(fname):
            ellipsis = '[...]'
            fname = os.path.split(fname)
            fname = fname[:1] + os.path.splitext(fname[1])
            fname[1] = remove_end(fname[1], ellipsis)
            flen = len(fname[1])
            if flen < 20:
                # give up
                return None
            # might try reduction by less than 2?
            fname[1] = compat_textwrap_shorten(fname[1], 1+flen/2, placeholder=ellipsis)
            return os.path.join(fname[0], ''.join(fname[1:]))

        alt_filename = filename
        while err.errno in (errno.ENAMETOOLONG,):
            alt_filename = reduce_filename(alt_filename)
            if not alt_filename:
                break
            try:
                stream = open(encodeFilename(alt_filename), open_mode)
                return (stream, alt_filename)
            except (IOError, OSError) as err:
                pass
        else:
            # not too long, but still wrong
            filename = alt_filename
        # --- end addition ---
        
        # In case of error, try to remove win32 forbidden chars
        alt_filename = sanitize_path(filename)
        if alt_filename == filename:
            raise
        else:
            # An exception here should be caught in the caller
            stream = open(encodeFilename(alt_filename), open_mode)
            return (stream, alt_filename)

Concerns here:

  • which of 'invalid filename' or 'filename too long' gets priority if both apply?
  • might other error conditions be signalled (eg, ENOENT, EINVAL) instead of ENAMETOOLONG?

There might also be additional guidance for developers to avoid extracting title and id fields that might exceed typical filename lengths.

@galileo-pkm
Copy link

galileo-pkm commented Nov 13, 2021

This is particularly annoying when downloading videos from Twitter as it takes the whole post as file name.
I usually use --id and rename the file manually.

@dirkf
Copy link
Contributor

dirkf commented Nov 13, 2021

That's really a bug in the Twitter extractor, IMO.

But feel free to apply the patch from the PR linked above and see if it does what you want.

@rvolgers
Copy link

rvolgers commented Dec 3, 2021

I don't see any other reasonable solution for the Twitter extractor. Maybe it could pre-truncate the post? That's not any better than just doing it later when constructing the output file name though.

I've run into this issue many times, as I usually want to use youtube-dl to quickly download a video to my downloads folder and I don't really care about the name but would prefer it give as much information as possible (so I don't want to just default to using the id).

I would really, really love it if somebody made a youtube-dl flag --just-download-it-seriously-i-dont-care-about-the-details-just-do-your-best-i-believe-in-you, bonus points if that's the default, but even just having it available so I can write a wrapper script or bash alias that uses it would be amazing.

@dirkf
Copy link
Contributor

dirkf commented Dec 3, 2021

The defaults really do work in the vast majority of cases.

You can write a config file with your preferred options and select it with --config-location in your script if some default setting should not be satisfactory (or just make it your default config).

For instance, you could put -o %(title).50s-%(id)s.%(ext)s to avoid downloads being broken by unreasonably long titles.

If anyone has a suggestion for what (reliably concise) metadata item could be used as the title for media extracted from Twitter, we could make a PR for Twitter specifically.

@palewire
Copy link

palewire commented Aug 9, 2022

I've just tonight run into this bug with the Twitter extractor. You can reproduce it with the following:

youtube-dl https://twitter.com/TulsiGabbard/status/1555878318469091330
[twitter] 1555878318469091330: Downloading guest token
[twitter] 1555878318469091330: Downloading JSON metadata
[twitter] 1555878318469091330: Downloading m3u8 information
ERROR: unable to open for writing: [Errno 36] File name too long: 'Tulsi Gabbard 🌺 - Puberty-blocking procedures promoted by the Biden_Harris Admin are child abuse. The FDA has recently confirmed these hormones_drugs have extremely dangerous side effects, like brain swelling and vision loss.-1555878318469091330.mp4.part'

Would you be open to a Twitter specific workaround patch? Perhaps the title could be changed to a combination of the uploader name and date, with the tweet text moved to the description field?

palewire added a commit to palewire/youtube-dl that referenced this issue Aug 9, 2022
@bryanpaget
Copy link

bryanpaget commented Sep 11, 2022

I am also running into this issue.

@rautamiekka
Copy link
Contributor

What is this? The 90s? This shouldn't be an error at all.

Depends on who/what you blame.

@dirkf
Copy link
Contributor

dirkf commented Sep 11, 2022

@bryanpaget, you might like to reveal more details of how you "[ran] into this issue", or is it just Twitter again?

Also, feel free to review the discussion in PR #29989 to understand why this is a difficult issue to fix retrospectively (other than by following the approach in #29912 (comment)), and possibly contribute to the design of a general solution.

@Vangelis66
Copy link

The forked project, yt-dlp, has implemented the following option:

 --trim-filenames LENGTH         Limit the filename length (excluding
                                 extension) to the specified number of
                                 characters

... By setting a value of ca. 180, default proposed Twitter filenames should be made compatible with a max-255-char filesystem:

yt-dlp -f http-632 "https://twitter.com/TulsiGabbard/status/1555878318469091330" => 

[twitter] 1555878318469091330: Downloading guest token
[twitter] 1555878318469091330: Downloading JSON metadata
[twitter] 1555878318469091330: Downloading m3u8 information
[info] 1555878318469091330: Downloading 1 format(s): http-632
ERROR: unable to open for writing: [Errno 2] No such file or directory: 'Tulsi Gabbard ?? - Puberty-blocking procedures promoted by the Biden?Harris Admin are child abuse. The FDA has recently confirmed these hormones?drugs have extremely dangerous side effects, like brain swelling and vision loss. [1555878318469091330].mp4.part'

... whereas:

yt-dlp -f http-632 "https://twitter.com/TulsiGabbard/status/1555878318469091330" --trim-filenames 185 => 

twitter] 1555878318469091330: Downloading guest token
[twitter] 1555878318469091330: Downloading JSON metadata
[twitter] 1555878318469091330: Downloading m3u8 information
[info] 1555878318469091330: Downloading 1 format(s): http-632
[download] Destination: Tulsi Gabbard ?? - Puberty-blocking procedures promoted by the Biden?Harris Admin are child abuse. The FDA has recently confirmed these hormones?drugs have extremely dangerous side effec. [1555878318469091330].mp4
[download] 100% of 2.35MiB in    00:02 at 863.56KiB/s

In the case of Twitter, id is NOT being truncated, which is a plus...

@dirkf
Copy link
Contributor

dirkf commented Sep 12, 2022

I don't see that --trim-filenames ... is much of an improvement on setting an explicit length for a problem field (title, I'm looking at you) in the output template.

A good solution would:

  • never write a file whose pathname is too long for the filesystem (even for intermediate files)
  • apply generally regardless of extractor and output template combinations
  • not require user intervention, such as having to guess a parameter value and pass it to avoid filename length errors
  • not alter any ID fields in the output template
  • any more?

Clearly these are not completely consistent requirements, as an output template might be, say, %(id)16777216s.%(ext)s. Some prioritisation is necessary.

I still hold that the excessive filename length provoked by Twitter is an extractor bug.

@pukkandan
Copy link
Contributor

--trim-filenames is a poorly implemented solution from youtube-dlc and I would not recommend porting it as-is. Even in yt-dlp, I recommend users to use the % formatting to control filename length instead. E.g. yt-dlp/yt-dlp#3494

In the case of Twitter, id is NOT being truncated, which is a plus...

This is actually a bug and shows how poorly implemented the option is! See yt-dlp/yt-dlp#2314. Do not rely on this behavior...

@paboum
Copy link

paboum commented Jan 13, 2023

Geez...

ERROR: unable to open for writing: [Errno 36] File name too long: 'Behind the scenes FOOTage 🎥🦶😅 • • • #epicfail #fail #fails #failure #cheer #cheerleading #cheerup #partnerstunt #growthmindset #growth #fyp #funny #funnyvideos #funnymemes #stunt _ Zach Reyes _ M.I.A. · Paper Planes-794721471829128.f792097348668973v-1.mp4.part'

It's not Twitter-specific after all. Just trim the filename length to 255 chars unless a specific parameter is provided by the user to override this limit.

@dirkf
Copy link
Contributor

dirkf commented Jan 13, 2023

As before, see #29912 (comment) and #29912 (comment) for why this doesn't happen, at least not as you suggest and not yet.

@lassik
Copy link

lassik commented Feb 6, 2023

Error message seems to come from this part of the code.

Could probably catch errno.ENAMETOOLONG and trim the filename until it fits.

@dirkf
Copy link
Contributor

dirkf commented Feb 6, 2023

Indeed, as explained by the immediately preceding comment and its linked comments.

NikitaBeloglazov added a commit to NikitaBeloglazov/ytcon that referenced this issue Jul 26, 2023
If the file name is too long, its word-by-word name is cuting until it is acceptable for filesystems (255 bytes)

(ytdl-org/youtube-dl#29912 and more more more issues)
@estatistics
Copy link

estatistics commented Sep 1, 2023

the first one --trim-filenames 90 didnt caught all filenames, the second one -o "%(title).90s-%(id)s.%(ext)s" workt for tiktok producing filenames long 90 characters plus ID name (id is extra characters).

@dirkf
Copy link
Contributor

dirkf commented Sep 1, 2023

#30839: "if you were actually running yt-dlp ..."

Also, 90 isn't a "constant": pick a number that works.

@paboum
Copy link

paboum commented Sep 1, 2023

In C programs, a library limits.h is available on POSIX systems, including vars like PATH_MAX denoting the uniform system limit for the length of an entire file name. The same should be available in Python somehow, just use it to make it work, default behavior should be safe - https://en.wikipedia.org/wiki/Best-effort_delivery

@rautamiekka
Copy link
Contributor

rautamiekka commented Sep 2, 2023

Some fields are more important than others, like the unique ID's, hashes, and file extensions can never be allowed to be truncated. As few as possible known fields could have an importance priority floating point number internally assigned in the extractor.

^ Then, generally, you can truncate all other fields equally to fit, as it'll be more important to have some knowledge of the contents in the filename than have none at all, and 125, not to mention 255, chars per segment (or even a full path) is a lotta text. If the User is hellbent on changing a sensible default into something very much non-sensible like ID + hash + keywords/tags + collection name + description, that's their fault.

Something like that.

@dirkf
Copy link
Contributor

dirkf commented Sep 2, 2023

I think #29912 (comment) is a reasonable statement of the problem (further suggestions welcome). Unfortunately we haven't been able to make any progress in creating an implementation to meet those requirements, or even in refining the problem to make it more tractable.

@ghost
Copy link

ghost commented Sep 2, 2023

personally I think its good for a project to have goals, and as important if not more important, non goals. the current -o option is powerful enough as is, that users can solve this problem on their own, without the need for any magic code on the part of YouTube-DL. I dont think this project should be in the business is trying to figure out what people want. the dumber the better, because every line of code committed has to be maintained, and last I checked we only have one or two maintainers, for a project with 100k stars and no financial backing. but at the end of the day its not my decision, just giving my two cents.

@dirkf
Copy link
Contributor

dirkf commented Sep 3, 2023

Plainly it's not a showstopping issue but it is one that generates repeated complaints. It's still open in case someone is eventually intrigued enough to work out a solution that meets an adequate subset of the problem constraints, after spending however long it takes locked in the basement with the necessary cold towels wrapped round the head. But we should fix Twitter/X if the extractor is ever viable and updated post-Musk.

@jgowdy
Copy link

jgowdy commented Aug 9, 2024

Yeah, having the tool just fail on long file names without the user applying different parameters to mitigate the issue is very rational and an awesome user experience. I mean just defaulting to a template that causes video downloads not to fail with reasonable truncation would be ridiculous. The legions of people coming to the Issues and hopefully reading the FAQ before creating this issue repeatedly shows the wisdom of this decision. 👍

@paboum
Copy link

paboum commented Aug 11, 2024

What is wrong with following the best-effort rule and utilizing limits.h or similar library if available on some systems?

@dirkf
Copy link
Contributor

dirkf commented Aug 11, 2024

When we see your PR implementing that idea and explaining how it meets (or not) the outline requirements above, we'll be able say what, if anything, is wrong with it.

paboum added a commit to paboum/youtube-dl that referenced this issue Aug 11, 2024
@paboum
Copy link

paboum commented Aug 11, 2024

Perhaps you shouldn't maintain this project if you cant "see" a simple idea.

#32910

Obviously this algorithm is sub-optimal and poorly implemented, but you should "see" what is missing in your code now.

Just use MAX_LENGTH to either:

  • Truncate the output filename to the length allowed by the system
  • Decrease the limit provide by the user as parameter
  • Refuse downloading files with name longer than this size

@dirkf
Copy link
Contributor

dirkf commented Aug 11, 2024

We know how to detect a filename that is too long; the question is how to truncate it in a way that meets the linked requirements as much as possible, when the output template mechanism means that the user can't necessarily know how long the constructed filename will be, and yet almost certainly doesn't want the download to fail if the filename is too long. The proposal doesn't address the hard part of the problem, unlike the not completely successful attempt in #29989.

@paboum
Copy link

paboum commented Aug 11, 2024

What is the current estimate of closing the ticket?
What is blocking you from applying whatever simple heuristic like "truncating to 255 always" to just stop failing in most use cases?
What knowledge are you missing and where do you expect to gather it, from whom, and when?
What is your plan to stop more tickets being created?

PS. From the wall of text you linked I understand, that people who use --output already (my guess is 0.01% of userbase) would suffer from lower-level truncation, so an option should be provided for them to (optionally) disengage it.

@dirkf
Copy link
Contributor

dirkf commented Aug 11, 2024

It works the other way. Just set maximum lengths for the fields in the output template that will meet the limits of your OS/filesystem, as well as any others to which you might copy the downloaded file. You should do this for the documented default (set in a config file), and for any specific output template that you pass in --output/-o .... Then any mutilation of template fields is entirely under your control.

The combined abilities of the many contributors to this project and yt-dlp have not so far found an acceptable solution to the general problem posed by this issue. I don't expect the issue to be closed soon or ever, but who knows?

@paboum
Copy link

paboum commented Aug 12, 2024

Don't forget to also add a available_ram and available_disk_space fields to the config file.
The user chooses to download a big video? Too bad, should have chosen a smaller file or format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.