Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fallback value for critical unset global options #73

Closed
gadenbuie opened this issue Sep 22, 2021 · 5 comments
Closed

Use fallback value for critical unset global options #73

gadenbuie opened this issue Sep 22, 2021 · 5 comments

Comments

@gadenbuie
Copy link

In nchar_ctl() (and other places) TRUE is intended to be the default value of warn.

fansi/R/substr2.R

Lines 98 to 101 in d872ac0

#' @param warn TRUE (default) or FALSE, whether to warn when potentially
#' problematic _Control Sequences_ are encountered. These could cause the
#' assumptions `fansi` makes about how strings are rendered on your display
#' to be incorrect, for example by moving the cursor (see [fansi]).

fansi/R/nchar.R

Lines 62 to 65 in d872ac0

nchar_ctl <- function(
x, type='chars', allowNA=FALSE, keepNA=NA, ctl='all',
warn=getOption('fansi.warn'), strip
) {

It's expected that getOption("fansi.warn") will yield TRUE, as set by fansi on load. But this option might inadvertently be unset, causing nchar_ctl() to throw an unexpected error. (See brodieG/diffobj#160 for a more realistic example than the one below.)

# imagine fansi is loaded elsewhere but its expected options are reset somehow
options(fansi.warn = NULL)

waldo::compare(1:5, 3:7)
#> Error in fansi::nchar_ctl(x): Argument `warn` must be TRUE or FALSE.

It would be helpful to declare the fallback default in the function signature with warn = getOption("fansi.warn", TRUE). I could probably help with a PR if you're open to the idea.

@brodieG
Copy link
Owner

brodieG commented Sep 23, 2021

Yes, indeed. A long time ago I chose not to use the default value for getOption for reasons that I've forgotten about. I think it carried over from diffobj, where I had a concern about the very large number of options-based parameters might start incurring a performance penalty (getOption(x) is faster than getOption(x, default) in particular when the option is set, although it is a small difference). I think that was a debatable choice even for diffobj given how much additional overhead is in the functions, and even though fansi is much leaner, the performance cost is still likely negligible overall.

With newer parameters I started switching to including the default there instead of doing it via the .onLoad, and I guess was too lazy to fix the old ones.

All this to say I think changing all the option based parameters to have fixed defaults is a good idea. I'll be happy to review a patch if you want to provide it (but please do so off the development branch). The patch should update all the parameters, and remove the relevant portions in .onLoad. Additionally, it would make sense to add helper functions to set the more complex parameters so they don't make the function signatures overly large (e.g. default_term_cap(), default_css(), or some such).

Alternatively you could wait for me to get to it on my own. I have a big release in the works so it wouldn't be a big deal for me to do that work, and I'm probably not publishing to CRAN either way until I complete the big release.

Thanks for reporting the issue.

@brodieG brodieG added this to the 0.6.0 milestone Sep 23, 2021
@gadenbuie
Copy link
Author

gadenbuie commented Sep 23, 2021

Thanks for your reply! I tend to live in a happy world where I don't need to benchmark often, but I tried out a few variations of getOption(opt, default) and (effectively) options()[[opt]] %||% default and I think the results are encouraging for getOption(opt, default).

I appreciate your taking the time to sketch out the work to update. Since the update is a little more involved to get right for fansi, I think I'll wait for now and let you handle this in your next release. It's also not super pressing for my needs now that I know I need to ensure the option is set.

options(beep = "boop")

opts_default <- function(opt, default = NULL) {
  opt <- options()[[opt]]
  if (is.null(opt)) default else opt
}

res <- bench::mark(
  `getOption, set, no fallback`        = getOption("beep"),
  `getOption, set, fallback not used`  = getOption("beep", "boop"),
  `getOption, unset, no fallback`      = getOption("zip"),
  `getOption, unset, fallback used`    = getOption("zip", "zap"),
  `options()$, set, no fallback`       = options()$beep,
  `options()$, set, fallback not used` = opts_default("beep", "boop"),
  `options()$, unset, no fallback`     = options()$zip,
  `options()$, unset, fallback used`   = opts_default("zip", "zap"),
  check = FALSE
)

knitr::kable(res[c(1:5)])
expression min median itr/sec mem_alloc
getOption, set, no fallback 1.21µs 1.56µs 561975.248 0B
getOption, set, fallback not used 1.34µs 1.61µs 499212.313 0B
getOption, unset, no fallback 1.16µs 1.33µs 608773.485 0B
getOption, unset, fallback used 1.32µs 1.5µs 529824.567 0B
options()$, set, no fallback 138.23µs 151.97µs 6328.956 5.06KB
options()$, set, fallback not used 141.65µs 143.99µs 6548.312 5.06KB
options()$, unset, no fallback 138.7µs 142.88µs 6821.244 5.06KB
options()$, unset, fallback used 139.08µs 143.51µs 6715.060 17.56KB

R 3.6.3 on OS X 11.5.2, x86_64-apple-darwin15.6.0 (64-bit)

@brodieG
Copy link
Owner

brodieG commented Sep 23, 2021

Thanks for running the benchmarks. Release is probably a month or two away.

@brodieG
Copy link
Owner

brodieG commented Oct 24, 2021

This should be fixed on the development branch. LMK if it doesn't work for you. Release is probably still 2-3 weeks away.

@brodieG
Copy link
Owner

brodieG commented Dec 16, 2021

My release estimate was pretty optimistic. I now think I'm 1-2 weeks away, absent any horrors in revdep checks.

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Jun 18, 2023
# fansi Release Notes

## v1.0.4

CRAN compiled code warning suppression release.

* Fix void function declarations and definitions.
* Change `sprintf` to `snprintf`.

## v1.0.3

* Address problem uncovered by gcc-12 linters, although the issue itself could
  not manifest due to redundancy of checks in the code.

## v1.0.0-2

This is a major release and includes some behavior changes.

### Features

* New functions:
    * [#26](brodieG/fansi#26) Replacement forms of
      `substr_cl` (i.e `substr_ctl<-`).
    * `state_at_end` to compute active state at end of a string.
    * `close_state` to generate a closing sequence given an active state.
    * [#31](brodieG/fansi#31) `trimws_ctl` as an
      equivalent to `trimws`.
    * [#64](brodieG/fansi#64) `normalize_sgr` converts
      compound _Control Sequences_ into normalized form (e.g. "ESC[44;31m"
      becomes "ESC[31mESC[44m") for better compatibility with
      [`crayon`](https://github.com/r-lib/crayon).  Additionally, most functions
      gain a `normalize` parameter so that they may return their output in
      normalized form (h/t @krlmlr).
* [#74](https://github.com/brodieG/fansi/issues/74)`substr_ctl` and related
  functions are now all-C instead of a combination of C offset computations and
  R level `substr` operations.  This greatly improves performance, particularly
  for vectors with many distinct strings.  Despite documentation claiming
  otherwise, `substr_ctl` was quite slow in that case.
* [#66](brodieG/fansi#66) Improved grapheme support,
  including accounting for them in `type="width"` mode, as well as a
  `type="graphemes"` mode to measure in graphemes instead of characters.
  Implementation is based on heuristics designed to work in most common use
  cases.
* `html_esc` gains a `what` parameter to indicate which HTML special characters
  should be escaped.
* Many functions gain `carry` and `terminate` parameters to control how `fansi`
  generated substrings interact with surrounding formats.
* [#71](brodieG/fansi#71) Functions that write SGR and
  OSC are now more parsimonious (see "Behavior Changes" below).
* [#73](brodieG/fansi#73) Default parameter values
  retrieved with `getOption` now always have explicit fallback values defined
  (h/t @gadenbui).
* Better warnings and error messages, including more granular messages for
  `unhandled_ctl` for adjacent _Control Sequences_.
* `term.cap` parameter now accepts "all" as value, like the `ctl` parameter.

### Deprecated Functions

* All the "sgr" functions (e.g., `substr_sgr`, `strwrap_sgr`) are deprecated.
  They will likely live on indefinitely, but they are of limited usefulness and
  with the added support for OSC hyperlinks their name is misleading.
* `sgr_to_html` is now `to_html` with slight modifications to semantics; the old
  function remains and does not warn about unescaped "<" or ">" in the
  input string.

### Behavior Changes

The major intentional behavior change is to default `fansi` to always recognize
true color CSI SGR sequences (e.g. `"ESC[38;2;128;50;245m"`).  The prior
default was to match the active terminal capabilities, but it is unlikely that
the intent of a user manipulating a string with truecolor sequences is to
interpret them incorrectly, even if their terminal does.  `fansi` will continue
to warn in this case.  To keep the pre-1.0 behavior add `"old"` to the
`term.cap` parameter.

Additionally, `to_html` will now warn if it encounters unescaped HTML special
character "<" or ">" in the input string.

Finally, the 1.0 release is an extensive refactoring of many parts of the
SGR and OSC hyperlink controls (_Special Sequences_) intake and output
algorithms.  In some cases this means that some `fansi` functions will output
_Special Sequences_ slightly differently than they did before.  In almost all
cases the rendering of the output should remain unchanged, although there are
some corner cases with changes (e.g. in `strwrap_ctl` SGRs embedded in
whitespace sequences don't break the sequence).

The changes are a side effect of applying more consistent treatment of corner
cases around leading and trailing control sequences and (partially) invalid
control sequences.  Trailing _Special Sequences_ in the output is now omitted as
it would be immediately closed (assuming `terminate=TRUE`, the default).
Leading SGR is interpreted and re-output.

Normally output consistency alone would not be a reason to change behavior, but
in this case the changes should be almost always undetectable in the
**rendered** output, and maintaining old inconsistent behavior in the midst of a
complete refactoring of the internals was beyond my patience.  I apologize if
these behavior changes adversely affect your programs.

> WARNING: we will strive to keep rendered appearance of `fansi` outputs
> consistent across releases, but the exact bytes used in the output of _Special
> Sequences_ may change.

Other changes:

* Tests may no longer pass with R < 4.0 although the package should still
  function correctly.  This is primarily because of changes to the character
  width Unicode Database that ships with R, and many of the newly added grapheme
  tests touch parts of that database that changed (emoji).
* CSI sequences with more than one "intermediate" byte are now considered valid,
  even though they are likely to be very rare, and CSI sequences consume all
  subsequent bytes until a valid closing byte or end of string is encountered.
* `strip_ctl` only warns with malformed CSI and OSC if they are reported as
  supported via the `ctl` parameter.  If CSI and OSC are indicated as not
  supported, but two byte escapes are, the two initial bytes of CSI and OSCs
  will be stripped.
* "unknown" encoded strings are no longer translated to UTF-8 in UTF-8 locales
  (they are instead assumed to be UTF-8).
* `nchar_ctl` preserves `dim`, `dimnames`, and `names` as the base functions do.
* UTF-8 known to be invalid should not be output, even if present in input
  (UTF-8 validation is not complete, only sequences that are obviously wrong are
  detected).

### Bug Fixes

* Fix `tabs_as_spaces` to handle sequential tabs, and to perform better on very
  wide strings.
* Strings with invalid UTF-8 sequences with "unknown" declared encoding in UTF-8
  locales now cause errors instead of being silently translated into byte
  escaped versions (e.g. "\xf0\xc2" (2 bytes), used to be interpreted as
  "<f0><c2>" (four characters).  These now cause errors as they would have if
  they had had "UTF-8" declared encoding.
* In some cases true colors of form "38;2;x;x;x" and "48;2;x;x;x" would only be
  partially transcribed.

### Internal Changes

* More aggressive UTF-8 validation, also, invalid UTF-8 code points now advance
  only one byte instead of their putative width based on the initial byte.
* Reduce peak memory usage by making some intermediate buffers eligible for
  garbage collection prior to native code returning to R.
* Reworked internals to simplify buffer size computation and synchronization, in
  some cases this might cause slightly reduced performance.  Please report any
  significant performance regressions.
* `nchar_ctl(...)` is no longer a wrapper for `nchar(strip_ctl(...))` so that it
  may correctly support grapheme width calculations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants