Fix strformat precision handling for strings #7941

skilchen · 2018-06-01T18:20:15Z

Another attempt to bring the precision handling for strings closer to what python does.
The relevant part of the documentation is almost a literal copy from python:

python:

The precision is a decimal number indicating how many digits should be displayed 
after the decimal point for a floating point value formatted with 'f' and 'F', or before 
and after the decimal point for a floating point value formatted with 'g' or 'G'. 
For non-number types the field indicates the maximum field size - in other words, 
how many characters will be used from the field content. The precision is not allowed 
for integer values.

Nim:

The 'precision' is a decimal number indicating how many digits should be displayed
after the decimal point in a floating point conversion. For non-numeric types the
field indicates the maximum field size - in other words, how many characters will
be used from the field content. The precision is ignored for integer conversions.

I think it makes sense to also copy python's behavior in this case. The possibility to limit the number of characters in a formatted string is useful to produce fixed size columns in a simple textual report.

This example:

import strformat

var str = "abc"
echo fmt"'>7.1 :: {str:>7.1}'", " == '>7.1 ::       a'"
echo fmt"'>7.2 :: {str:>7.2}'", " == '>7.2 ::      ab'"
echo fmt"'>7.3 :: {str:>7.3}'", " == '>7.3 ::     abc'"
echo fmt"'>7.9 :: {str:>7.9}'", " == '>7.9 ::     abc'"
echo fmt"'>7.0 :: {str:>7.0}'", " == '>7.0 ::        '"
echo fmt"' 7.1 :: {str:7.1}'", " == ' 7.1 :: a      '"
echo fmt"' 7.2 :: {str:7.2}'", " == ' 7.2 :: ab     '"
echo fmt"' 7.3 :: {str:7.3}'", " == ' 7.3 :: abc    '"
echo fmt"' 7.9 :: {str:7.9}'", " == ' 7.9 :: abc    '"
echo fmt"' 7.0 :: {str:7.0}'", " == ' 7.0 ::        '"
echo fmt"'^7.1 :: {str:^7.1}'", " == '^7.1 ::    a   '"
echo fmt"'^7.2 :: {str:^7.2}'", " == '^7.2 ::   ab   '"
echo fmt"'^7.3 :: {str:^7.3}'", " == '^7.3 ::   abc  '"
echo fmt"'^7.9 :: {str:^7.9}'", " == '^7.9 ::   abc  '"
echo fmt"'^7.0 :: {str:^7.0}'", " == '^7.0 ::        '"
str = "äöüe\u0309\u0319\u035Co\u0309"
echo fmt"'^7.1 :: {str:^7.1}'", " == '^7.1 ::    ä   '"
echo fmt"'^7.2 :: {str:^7.2}'", " == '^7.2 ::   äö   '"
echo fmt"'^7.3 :: {str:^7.3}'", " == '^7.3 ::   äöü  '"
echo fmt"'^7.0 :: {str:^7.0}'", " == '^7.0 ::        '"
echo "what follows is actually wrong, but the unicode"
echo "module has no support for graphemes"
echo fmt"'^7.4 :: {str:^7.4}'", " == '^7.4 ::  äöüe  '"
echo fmt"'^7.9 :: {str:^7.9}'", " == '^7.9 :: äöüe\u0309\u0319\u035Co\u0309'"
echo "ideally it should produce"
echo "'^7.4 ::  äöüe\u0309  '", " == '^7.4 ::  äöüe\u0309  '"
echo "'^7.9 ::  äöüe\u0309\u0319\u035Co\u0309 '", " == '^7.9 ::  äöüe\u0309\u0319\u035Co\u0309 '"

produces

'>7.1 ::       a' == '>7.1 ::       a'
'>7.2 ::      ab' == '>7.2 ::      ab'
'>7.3 ::     abc' == '>7.3 ::     abc'
'>7.9 ::     abc' == '>7.9 ::     abc'
'>7.0 ::        ' == '>7.0 ::        '
' 7.1 :: a      ' == ' 7.1 :: a      '
' 7.2 :: ab     ' == ' 7.2 :: ab     '
' 7.3 :: abc    ' == ' 7.3 :: abc    '
' 7.9 :: abc    ' == ' 7.9 :: abc    '
' 7.0 ::        ' == ' 7.0 ::        '
'^7.1 ::    a   ' == '^7.1 ::    a   '
'^7.2 ::   ab   ' == '^7.2 ::   ab   '
'^7.3 ::   abc  ' == '^7.3 ::   abc  '
'^7.9 ::   abc  ' == '^7.9 ::   abc  '
'^7.0 ::        ' == '^7.0 ::        '
'^7.1 ::    ä   ' == '^7.1 ::    ä   '
'^7.2 ::   äö   ' == '^7.2 ::   äö   '
'^7.3 ::   äöü  ' == '^7.3 ::   äöü  '
'^7.0 ::        ' == '^7.0 ::        '
what follows is actually wrong, but the unicode
module has no support for graphemes
'^7.4 ::  äöüe  ' == '^7.4 ::  äöüe  '
'^7.9 :: äöüẻ̙͜ỏ' == '^7.9 :: äöüẻ̙͜ỏ'
ideally it should produce
'^7.4 ::  äöüẻ  ' == '^7.4 ::  äöüẻ  '
'^7.9 ::  äöüẻ̙͜ỏ ' == '^7.9 ::  äöüẻ̙͜ỏ '

kaushalmodi · 2018-06-02T02:46:36Z

I think it makes sense to also copy python's behavior in this case.

+1. That makes sense. You get to implement a feature without a "surprise API" for people crossing over to Nim.

Varriount · 2018-06-02T09:43:17Z

lib/pure/strformat.nim

  case spec.typ
  of 's', '\0': discard
  else:
    raise newException(ValueError,
      "invalid type in format string for string, expected 's', but got " &
      spec.typ)
+  if spec.precision != -1:
+    if spec.precision < runelen(value):
+      value = value.runeSubstr(0, spec.precision)


It would be better if, instead of allocating a new string via slicing, this set the length of value instead.

… of runeSubstr

…/github.com/skilchen/Nim into fix_strformat_precision_handling_for_strings

Araq · 2018-06-04T11:25:53Z

lib/pure/strformat.nim

@@ -558,12 +558,16 @@ proc format*(value: string; specifier: string; res: var string) =
  ## sense to call this directly, but it is required to exist
  ## by the ``&`` macro.
  let spec = parseStandardFormatSpecifier(specifier)
+  var value = value


This seems the wrong place to fix it, shouldn't alignString be changed instead?

No, because alignString is also used for floats that are already stringified to the requested "precision". For floats you don't want to deal with "precision" in alignString.
IMHO alignString should just align strings and not modify the passed in string in any way.

skilchen added 2 commits June 1, 2018 01:28

fix strformat precision handling for strings

b80414b

add some limited unicode awareness to the precision handling for strings

8b5663f

skilchen mentioned this pull request Jun 1, 2018

[strformat] Update the documentation to state that the precision field works only for floats #7933

Closed

Merge branch 'devel' into fix_strformat_precision_handling_for_strings

934331d

Varriount requested changes Jun 2, 2018

View reviewed changes

skilchen added 2 commits June 2, 2018 17:18

improvement suggested by Varriount: use setLen and runeOffset instead…

522b57a

… of runeSubstr

Merge branch 'fix_strformat_precision_handling_for_strings' of http:/…

8e7c91e

…/github.com/skilchen/Nim into fix_strformat_precision_handling_for_strings

Araq reviewed Jun 4, 2018

View reviewed changes

Varriount approved these changes Jun 4, 2018

View reviewed changes

Varriount merged commit fd102f3 into nim-lang:devel Jun 4, 2018

skilchen deleted the fix_strformat_precision_handling_for_strings branch June 15, 2018 13:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix strformat precision handling for strings #7941

Fix strformat precision handling for strings #7941

skilchen commented Jun 1, 2018

kaushalmodi commented Jun 2, 2018

Varriount Jun 2, 2018

Araq Jun 4, 2018

skilchen Jun 4, 2018

Fix strformat precision handling for strings #7941

Fix strformat precision handling for strings #7941

Conversation

skilchen commented Jun 1, 2018

kaushalmodi commented Jun 2, 2018

Varriount Jun 2, 2018

Choose a reason for hiding this comment

Araq Jun 4, 2018

Choose a reason for hiding this comment

skilchen Jun 4, 2018

Choose a reason for hiding this comment