-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for IEC (KiB, MiB, ...) and SI (kB, MB, ...) binary units #6
Comments
IEC units are now supported by R. As the next step, I filed a backward-compatible patch to add support for JEDEC units in |
|
Thanks for the comments.
What do you think? |
Another approach that could work is to add support for UPDATE: The issues with this is that it's not possible to control whether |
Here's my new proposal for supporting "legacy", IEC and SI units in a backward compatible way and such that it will be easy to switch from today's default "legacy" to SI units at some point in R's future. The file to be updated in R is src/library/utils/R/object.size.R: object.size <- function(x)
structure(.Call(C_objectSize, x), class = "object_size")
format.object_size <- function(x, units = "b", standard = "auto", digits = 1L, ...)
{
known_bases <- c(legacy = 1024, IEC = 1024, SI = 1000)
known_units <- list(
SI = c("B", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB"),
IEC = c("B", "KiB", "MiB", "GiB", "TiB", "PiB", "EiB", "ZiB", "YiB"),
legacy = c("b", "Kb", "Mb", "Gb", "Tb", "Pb"),
LEGACY = c("B", "KB", "MB", "GB", "TB", "PB")
)
units <- match.arg(units, c("auto", unique(unlist(known_units), use.names = FALSE)))
standard <- match.arg(standard, c("auto", names(known_bases)))
## Infer 'standard' from 'units'?
if (standard == "auto") {
standard <- "legacy" ## default; to become "SI"
if (units != "auto") {
if (grepl("iB$", units)) {
standard <- "IEC"
} else if (grepl("b$", units)) {
standard <- "legacy" ## keep when "SI" is the default
} else if (units == "kB") {
## SPECIAL: Drop when "SI" becomes the default
stop("For SI units, please specify standard = \"SI\"")
}
}
}
base <- known_bases[[standard]]
units_map <- known_units[[standard]]
if (units == "auto") {
power <- if (x <= 0) 0 else min(as.integer(log(x, base = base)), length(units_map) - 1L)
} else {
power <- match(toupper(units), toupper(units_map)) - 1L
if (is.na(power)) {
stop(gettextf("Unit %s is not part of standard %s", sQuote(units), sQuote(standard)))
}
}
unit <- units_map[power + 1L]
## SPECIAL: Use suffix 'bytes' instead of 'b' for 'legacy'
if (power == 0 && standard == "legacy") unit <- "bytes"
paste(round(x / base^power, digits = digits), unit)
}
print.object_size <-
function(x, quote = FALSE, units = "b", standard = "auto", digits = 1L, ...)
{
y <- format.object_size(x, units = units, standard = standard, digits = digits)
if(quote) print.default(y, ...) else cat(y, "\n", sep = "")
invisible(x)
} Examples and testsassert_size <- function(x, ..., expected) {
size <- structure(x, class = "object_size")
res <- try(format(size, ...), silent = TRUE)
if (expected == "error") {
if (!inherits(res, "try-error"))
stop(sprintf("Expected %s but got %s", sQuote(expected), sQuote(res)))
} else if (res != expected) {
stop(sprintf("Expected %s but got %s", sQuote(expected), sQuote(res)))
}
}
## The default is the 'legacy' standard (backward compatibility)
assert_size(0, expected = "0 bytes")
assert_size(1, expected = "1 bytes")
assert_size(1023, expected = "1023 bytes")
assert_size(1024, expected = "1024 bytes")
## Standard inferred from 'legacy' units
assert_size(0, units = "b", expected = "0 bytes")
assert_size(1, units = "B", expected = "1 bytes")
assert_size(999, units = "B", expected = "999 bytes")
assert_size(1000, units = "Kb", expected = "1 Kb")
assert_size(1024, units = "KB", expected = "1 Kb")
assert_size(2.0 * 1000^2, units = "MB", expected = "1.9 Mb")
assert_size(3.1 * 1000^3, units = "GB", expected = "2.9 Gb")
assert_size(4.2 * 1000^8, units = "TB", expected = "3819877747446.3 Tb")
assert_size(4.2 * 1000^9, units = "Pb", expected = "3730349362740.5 Pb")
## Standard inferred from 'IEC' units
assert_size(1000, units = "KiB", expected = "1 KiB")
assert_size(1024, units = "KiB", expected = "1 KiB")
assert_size(2.0 * 1000^2, units = "MiB", expected = "1.9 MiB")
assert_size(3.1 * 1000^3, units = "GiB", expected = "2.9 GiB")
assert_size(4.2 * 1000^8, units = "TiB", expected = "3819877747446.3 TiB")
assert_size(4.2 * 1000^9, units = "PiB", expected = "3730349362740.5 PiB")
## Inferring standard from 'SI' units is not possible because they
## conflict with 'legacy' units (and it would be confusing to support
## high-range SI units not covered by the legacy units)
assert_size(3.1 * 1024^1, units = "kB", expected = "error")
assert_size(3.1 * 1024^6, units = "EB", expected = "error")
assert_size(3.1 * 1024^7, units = "ZB", expected = "error")
assert_size(3.1 * 1024^8, units = "YB", expected = "error")
## Automatic 'legacy' units (default)
assert_size(0, units = "auto", expected = "0 bytes")
assert_size(1, units = "auto", expected = "1 bytes")
assert_size(1023, units = "auto", expected = "1023 bytes")
assert_size(1024, units = "auto", expected = "1 Kb")
assert_size(2.0 * 1000^2, units = "auto", expected = "1.9 Mb")
## Automatic 'legacy' units
assert_size(0, units = "auto", standard = "legacy", expected = "0 bytes")
assert_size(1, units = "auto", standard = "legacy", expected = "1 bytes")
assert_size(1023, units = "auto", standard = "legacy", expected = "1023 bytes")
assert_size(1024, units = "auto", standard = "legacy", expected = "1 Kb")
assert_size(2.0 * 1000^2, units = "auto", standard = "legacy", expected = "1.9 Mb")
assert_size(3.1 * 1024^3, units = "auto", standard = "legacy", expected = "3.1 Gb")
assert_size(3.1 * 1024^4, units = "auto", standard = "legacy", expected = "3.1 Tb")
assert_size(3.1 * 1024^5, units = "auto", standard = "legacy", expected = "3.1 Pb")
assert_size(3.1 * 1024^6, units = "auto", standard = "legacy", expected = "3174.4 Pb")
## Automatic 'IEC' units
assert_size(0, units = "auto", standard = "IEC", expected = "0 B")
assert_size(1, units = "auto", standard = "IEC", expected = "1 B")
assert_size(1023, units = "auto", standard = "IEC", expected = "1023 B")
assert_size(1024, units = "auto", standard = "IEC", expected = "1 KiB")
assert_size(2.0 * 1000^2, units = "auto", standard = "IEC", expected = "1.9 MiB")
assert_size(3.1 * 1024^3, units = "auto", standard = "IEC", expected = "3.1 GiB")
assert_size(3.1 * 1024^4, units = "auto", standard = "IEC", expected = "3.1 TiB")
assert_size(3.1 * 1024^5, units = "auto", standard = "IEC", expected = "3.1 PiB")
assert_size(3.1 * 1024^6, units = "auto", standard = "IEC", expected = "3.1 EiB")
assert_size(3.1 * 1024^7, units = "auto", standard = "IEC", expected = "3.1 ZiB")
assert_size(4.2 * 1024^8, units = "auto", standard = "IEC", expected = "4.2 YiB")
assert_size(4.2 * 1024^9, units = "auto", standard = "IEC", expected = "4300.8 YiB")
## Automatic 'SI' units
assert_size(0, units = "auto", standard = "SI", expected = "0 B")
assert_size(1, units = "auto", standard = "SI", expected = "1 B")
assert_size(999, units = "auto", standard = "SI", expected = "999 B")
assert_size(1000, units = "auto", standard = "SI", expected = "1 kB")
assert_size(1024, units = "auto", standard = "SI", expected = "1 kB")
assert_size(2.0 * 1000^2, units = "auto", standard = "SI", expected = "2 MB")
assert_size(3.1 * 1000^3, units = "auto", standard = "SI", expected = "3.1 GB")
assert_size(3.1 * 1000^4, units = "auto", standard = "SI", expected = "3.1 TB")
assert_size(3.1 * 1000^5, units = "auto", standard = "SI", expected = "3.1 PB")
assert_size(3.1 * 1000^6, units = "auto", standard = "SI", expected = "3.1 EB")
assert_size(3.1 * 1000^7, units = "auto", standard = "SI", expected = "3.1 ZB")
assert_size(4.2 * 1000^8, units = "auto", standard = "SI", expected = "4.2 YB")
assert_size(4.2 * 1000^9, units = "auto", standard = "SI", expected = "4200 YB")
UPDATE: 2017-01-01: Forgot that SI uses 'kB'; minor tweaks above. |
UPDATE: SI units are now supported in R-devel, see r71960. |
I'll just add a link to a thread on twitter for your future references on this topic: https://twitter.com/henrikbengtsson/status/1231986947360354305 |
Posted PR18297 titled 'Use standard file-size units everywhere in base R (e.g., Mb -> MiB)' on 2022-02-01. |
Filed PR18435 adding new SI prefixes RB (ronnabytes) and QB (quettabytes) to |
SI prefixes RB (ronnabytes) and QB (quettabytes) was has been added to R-devel (to become R 4.3.0), cf. wch/r-source@cd2d0ba |
One more location to fix, was just added to |
On Bugzilla at https://bugs.r-project.org/show_bug.cgi?id=18297#c2. |
Background
There are a few standards [1] for binary prefixes for byte-size units:
Note that for decimal prefixes, we have:
For byte versus bit, we have:
Problem
For example,
This is specific example illustrates a problem with
utils:::format.object_size()
. Another example is:The issue with non-standard byte units in R has been reported to R-devel [5].
Wish / Suggestion
JEDECand SI prefixes where applicable;utils:::format.object_size()
, cf. PR #16649. Completed as of 2016-01-06 in r69879.JEDEC units for, cf. PR #16657. UPDATE: See discussion in comments below.utils:::format.object_size()
utils:::format.object_size()
. UPDATE: Added to R-devel on 2017-01-11 (r71960)getOptions("byte.unit.standard", "legacy")
.IECSI units the new default, e.g.gc()
,format.object_size(..., units="auto")
and allocation error messages.b
) with.Deprecate()
..Defunct()
.Known functions / code affected:
R code:
utils:::format.object_size()
- can be tweaked to support a global option.base::gc()
- can be tweaked to support a global option.check_install_sizes()
oftools::.check_packages()
- could be updated to make use offormat.object_size()
.tools::format.compactPDF()
- could be updated to make use offormat.object_size()
.tools::(print.check_package_compact_datasets)
- could be updated to make use offormat.object_size()
.Native code
base::gc(verbose = TRUE)
outputs%.1f Mbytes of cons cells used (%d%%) ...
originating from src/main/memory.cNative code related to "out-of-memory" errors
cannot allocate memory block of size %0.1f Gb
error messages in src/main/memory.ccannot allocate vector of size %0.1f Gb
error messages in src/main/memory.cReached total allocation of %dMb: see help(memory.size)
in src/gnuwin32/malloc.c."don't be silly!: your machine has a 4Gb address limit"
in src/gnuwin32/extra.c"vector memory limit of %0.1f %s reached, see mem.maxVSize()"
in src/main/memory.c.Note, the out-of-memory errors in the native code can not easily be tweaked to support a global option; if tried, then there is a risk that that triggers another out-of-memory error.
Usages of IEC / SI elsewhere
References
src/gnuwin32/malloc.c
to the list of places that needs to be updated.byte.unit.standard
for smooth transition.The text was updated successfully, but these errors were encountered: