Fix byte formatting abbreviations to use established standards #49115

marcovo · 2023-11-24T09:13:30Z

In #48845, a Number class was added, following a request to add a file size formatting function to Laravel. After the ultimate merge, the byte formatting function Number::fileSize() uses a custom abbreviation scheme. I want to cite my two comments in the mentioned pull request:

Oct 30

The use of prefixes in bytesToHuman() seems to follow a custom naming scheme instead of established standards.

I notice two things:

The prefixes used seem to be metric prefixes (k, M, G, ...), calculating base-10, with the exception that an uppercase K is used instead of the official lowercase k

The calculations seem to be done using base-2, which would actually suggest binary prefixes (Ki, Mi, Gi, ...) should be used

I propose to replace bytesToHuman() with two other methods:

bytesToMetric(), using division by 1000 and using metric abbreviations kB, MB, GB, ...

bytesToBinary() (or some better name), using division by 1024 and using binary abbreviations KiB, MiB, GiB, ...

Other suggestions: toMetricBytes/toBinaryBytes, formatMetricBytes/formatBinaryBytes

Oct 31

I want to stress that the code in the current state uses binary calculations in combination with decimal/metric abbreviations, a combination that would lead to technically confusing results. Do note however, that Windows (up to this date) also does exactly that: it reports filesizes in multiples of 1024 while using metric abbreviations. I would propose to not follow this false windows-path and stick to binary calculations with binary prefixes and/or metric calculations with metric prefixes.

To put the above comments into current context, Laravel 10.x since ships with a Number::fileSize() method that reports metric abbreviations, while it performs calculations using multiples of 1024. This leads to erroneous results when formatting relatively large numbers, e.g. an amount of 10^13 bytes is reported as 10 TB while it surely should be 11 TB, according to the definition of T = 10^12. A more detailed explanation of related errors can be found on wikipedia.

Another way to put it: currently this method reports metric units, but calculates quantities in a non-metric fashion.

Reports of byte quantities using multiples of 1024 do have their place within computing, but these are primarily in technical details like amount of RAM, storage disk space or CPU caches, where the hardware is built in a binary fashion by nature. A file size is not binary by nature, as a file can be any number of bytes large, not limited to only being equal to multiples of 2. Hence, it is not necessary nor logical to use multiples of 1024 in file size reporting.

This leads me to the conclusion that the most enduser-friendly thing to do here, is to use multiples of 1000 instead of 1024 to calculate the (abbreviated) units:

There is no apparent reason to use multiples of 1024
Using multiples of 1000 ensures that the usage of the established units MB, GB, TB, ... is in line with the metric system
Divisions by 1000 are more user-friendly as the relation between numbers of bytes and numbers of (e.g.) terabytes is immediately apparent, not having to reason about why the formatted number of bytes reports something that seems to be just a little of compared to the raw number of bytes

In the rare circumstance one needs to report figures about hardware properties, I propose developers create their own method that reports byte quantities in binary units (kibi, mebi, ...), along with calculations using multiples of 1024.

To sum up, in this PR I propose the following 2 changes to fully comply with the metric system in order to use the metric prefixes:

Calculate using multiples of 1000 instead of 1024
Use the correct abbreviation k for kilo instead of K.

If this PR does not get merged, I would advise to update the docs on this method to warn developers that the used abbreviations follow a custom naming scheme that does not follow an established standard.

taylorotwell · 2023-11-24T14:37:09Z

No plans to change this right now.

caendesilva · 2023-11-25T18:49:58Z

Author of the original Number utility here. Just wanted to chime in with my thought process. This helper, like all of the ones in the Number class, is intended for quickly formatting (often very large) integers into something that is easily digestible and understood by the end user who is likely not technical.

Consider this: you have an online media library where users can download images. It's fair to give them a heads up of how large the file is, but your backend uses the standard filesize function to get the image size, and returns an integer of 2300000, a number that is entirely meaningless to your end user; so you pass it through the fileSize helper, and now you have a string of 2.3 MB. The user often sees these kinds of units, and can put it into context. They don't care at all which base the formatter uses, as it won't matter to them if it's actually 2.2 or 2.4 megabytes. They have a rough idea of how much it is. Since it's all in the same order of magnitude, it doesn't really matter here.

The user is happy because they can get an idea of the file size, and the developer is happy because there is a really fast way to give the user this information. Having two methods for different bases, or adding an extra parameter, gives more work for the developer as they now have the increased cognitive load of deciding which to use, when it ultimately does not matter to the user.

If your users actually need precise formatting, maybe this helper is not for you. But for 99% of the use cases, it's fine. Simplicity over complexity. Same with the currency formatter. If you need something really precise, for example when making a webshop or catalogue room, you probably want to use an entire library just for currency because you cannot tolerate errors caused by weird edge cases and floating point arithmetics.

But if you just want to quickly format a currency, a number, a filesize, whatever, all of these helpers are designed to give you a quick way to turn an abstract number into something that is meaningful, even if you lose some precision.

marcovo · 2023-11-26T07:55:50Z

@caendesilva I get your point and actually agree with it to quite some extent. And, if computing a correct formatted value were considerably more complex or CPU-intensive than returning a more simple value, I would completely agree with you that the current approach would be the way to go.

However, the way I see it, one can choose between the merged approach which gives an indication of filesize (like you are sketching), or my proposed approach which also gives a similar indication of filesize as well as being factually correct. Seeing as no logical changes are required or complexity is added, this to me seems the best and only approach.

Note by the way I'm not proposing to add multiple methods or yet another parameter. This PR proposed to change the single method to do 1 thing, which is the thing I would most likely expect it to do.

Still, I recognize perfectly well I'm not alone on earth and everyone has different opinions. The reason I made this PR is that Taylor made no comment on his choices when editing your PR before merging it, hence I did not know whether those changes were made in ignorance of my comments, or whether those were made knowingly. In my duty to contribute insight to the community, I created this PR to ensure that either the decisions on the details of the method were made consciously, or the (in my opinion) improved version would be merged. As Laravel is a piece of software with great impact, it is good to think about the details of these kinds of things, and the influence it has on the larger digital ecosystem.

I did not make this PR to make a point or something like that, or to bash the current implementation. I appreciate the work you have put in this Number class, I only regret the way the fileSize method turned out. I see it as a missed opportunity to turn the tide of ill-presented byte-quantities that has been going on since before the year 2000. But Taylor has made his verdict, I hope many developers can find use in it and I shall continue to wince at every website and application I come across that uses formattings like this 😉

Fix byte formatting abbreviations

1f835ea

taylorotwell closed this Nov 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix byte formatting abbreviations to use established standards #49115

Fix byte formatting abbreviations to use established standards #49115

marcovo commented Nov 24, 2023 •

edited

Loading

taylorotwell commented Nov 24, 2023

caendesilva commented Nov 25, 2023

marcovo commented Nov 26, 2023

Fix byte formatting abbreviations to use established standards #49115

Fix byte formatting abbreviations to use established standards #49115

Conversation

marcovo commented Nov 24, 2023 • edited Loading

taylorotwell commented Nov 24, 2023

caendesilva commented Nov 25, 2023

marcovo commented Nov 26, 2023

marcovo commented Nov 24, 2023 •

edited

Loading