Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add format_string routine to format other types to strings #444

Merged

Conversation

St-Maxwell
Copy link
Member

  1. src/stdlib_strings_format_string.fypp
  2. src/tests/string/test_strings_format_string.f90
  3. stdlib_strings.fypp%format_string interface
    modify:
  4. doc/spec/stdlib_strings.md%format_string(doc)
  5. stdlib_strings.f90 -> stdlib_strings.fypp(rename)
  6. src/Makefile.manual%format_string(make)
  7. src/tests/string/Makefile.manual%format_string(make)
  8. src/CMakelists.txt%format_string(cmake)
  9. src/CMakelists.txt%format_string(cmake)
    note:
    make test passed.
    cmake test passed

1. src/stdlib_strings_format_string.fypp
2. src/tests/string/test_strings_format_string.f90
3. stdlib_strings.fypp%format_string interface
modify:
1. doc/spec/stdlib_strings.md%format_string(doc)
2. stdlib_strings.f90 -> stdlib_strings.fypp(rename)
3. src/Makefile.manual%format_string(make)
4. src/tests/string/Makefile.manual%format_string(make)
5. src/CMakelists.txt%format_string(cmake)
6. src/CMakelists.txt%format_string(cmake)
note:
make test passed.
cmake test passed
@zoziha zoziha mentioned this pull request Jun 25, 2021
7 tasks
@awvwgk awvwgk changed the title add: Add format_string routine to format other types to strings Jun 25, 2021
@awvwgk awvwgk added the reviewers needed This patch requires extra eyes label Jun 25, 2021
doc/specs/stdlib_strings.md Outdated Show resolved Hide resolved
Comment on lines 20 to 21
string = '*'
!!\TODO: *?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a meaningful number of asterisks in the output we must be able to parse format strings and determine the width of the field we are writing into. Just writing a single asterisk until we are able to do so in stdlib sounds like a good compromise for now. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is OK, just writing a single asterisk until we are able to do so in stdlib.😜

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
I think the * sign here should be distinguished a little bit more, for example, written as [*], !*! and the like can be distinguished from the ***** sign (too narrow formatter for variables) before and after it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking a failed formatter should be simple, currently I'm using starts_with(format_string(var, "(...)"), "*"). This is not fail-safe as it would match a formatter like '("*", g0)' as well. Using format_string should imply that we trade some of the versatility of the internal IO approach for having a function.

How about returning an empty string on failure? This would be clearly distinct from a too narrow formatter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is mainly to remind the users: an empty string may not be effectively fed back to the user, this is a format_string failure.

  1. We need to notify the user more clearly that this has failed;
  2. The string output after failure is simple enough;
  3. Does not hinder the effective operation of the main process of our Fortran program.

St-Maxwell and others added 3 commits June 26, 2021 13:41
clean the implementation of format_string
fix:
1. src/stdlib_strings.fypp%format_string interface comments
2. src/stdlib_strings_format_string.fypp
update:
1. src/tests/string/test_strings_format_string.f90
2. doc/specs/stdlib_strings.md
note:
make passed
cmake passed
@awvwgk
Copy link
Member

awvwgk commented Jun 26, 2021

I added a suggestion as PR against yours (see St-Maxwell#1) for expanding the unit testing and providing better error reporting in case of a mismatch in the formatted strings.

Expand unit testing for more cases
@awvwgk
Copy link
Member

awvwgk commented Jun 26, 2021

Looks like our choice of default formatter with (g0) is not well defined across compilers. This is a bit surprising to me to be honest, but maybe someone more well-versed in the standard can shed some light on this issue.

For ifort we find two decimal places less with (g0) than with gfortran:

31/35 Test #31: strings_format_string ............***Failed    0.00 sec
 format_string(complex) : 
 Default formatter for complex number
Expected: '(1.00000000,1.00000000)' but got '(1.000000,1.000000)'

@zoziha
Copy link
Contributor

zoziha commented Jun 26, 2021

Links

  1. Intel '(g0)'

    If w is 0, the field width is selected by the processor. If w is zero, you can only specify forms G0 or G0.d.

  2. fortran18 standard, page270
    image

It seems that the standard says that the width is determined by the processor, but it is actually determined by the compiler on ifort or gfortran?

@awvwgk
Copy link
Member

awvwgk commented Jun 26, 2021

It seems that the standard says that the width is determined by the processor

Processor is standard-speak for compiler.

zoziha added 3 commits June 26, 2021 17:27
…4 (comment))

fix:
1. src/tests/string/test_strings_format_string.f90
2. romove the `buffer` variable in `format_string_c${kind}$`
update:
1. doc/specs/stdlib_strings.md%format_string
note:
make passed
cmake passed
@St-Maxwell
Copy link
Member Author

🤔I'm wondering if there are some compiler options of Intel compiler that can prevent applying edit descriptor on mismatching type of values. And if exist, we can add them into CMakeLists.txt.

@awvwgk
Copy link
Member

awvwgk commented Jun 26, 2021

I opened #446 to discuss handling smart behaviour in compilers. I think we want consistent results which are independent of the compiler, but we also don't want to set too many special flags when compiling, because some of them (like -standard-semantics for ifort) can cause problems when trying to reuse stdlib in other projects.

Eventually, not using the internal IO but have our own implementation of format descriptors would be the preferred solution, as it makes as most independent of the actual details of compiler extensions.

@St-Maxwell
Copy link
Member Author

St-Maxwell commented Jun 26, 2021

Eventually, not using the internal IO but have our own implementation of format descriptors would be the preferred solution, as it makes as most independent of the actual details of compiler extensions.

I support implementing a procedure converting floating-point numbers to strings in stdlib in the future. I found Grisu or Ryū algorithm might be the choice. Unfortunately I'm not able to implement it. This needs the help from the community.

doc/specs/stdlib_strings.md Outdated Show resolved Hide resolved
@ivan-pi
Copy link
Member

ivan-pi commented Jul 23, 2021

Let's get some more preferences on the naming; @certik, @LKedward @nshaffer @ivan-pi @jvdp1 which names do you like best?

How about print like in the proposals for WG5 from Espen Myklebust (see Section 7.1.3)?

function print(expr,fmt)
  <numeric-type>, intent(in) :: expr
  character(len=*), intent(in), optional :: fmt
  character(len=:), allocatable :: print
end function

@ivan-pi
Copy link
Member

ivan-pi commented Jul 23, 2021

@zoziha, concerning the following point

  1. It is more similar to to_string in semantics, but to_string belongs to the stdlib_ascii module, and to_chars will belongs to the stdlib_string module. The main functions of the two modules have a certain distinction.

see #399. IMO, also to_string should be moved from stdlib_ascii to stdlib_string.

@St-Maxwell
Copy link
Member Author

Sorry for my absence in recent discussion because I was busy in my project. I like format, and to_chars is also OK for me.

@zoziha
Copy link
Contributor

zoziha commented Jul 24, 2021

So I think we can drop the buffer size down to 256, or maybe even 128.

Yes, 512 is a bit long, we can decide a certain length, it can be 128. And it is stated in the docs (stdlib_strings.md).

Trimming the buffer means that trailing whitespace will not be respected.

If I try to cancel the trim, more complicated things will happen, or we have to spend time trying to write routines to parse the "(g0, 5x)" format, which will become more complicated. At this stage, it is better for us to annotate it in the document. "We can consider fancier approaches later."

A format like (1000x, g0) will still overflow with the current buffersize, what we need in the long run is a format parser which can tell us exactly how long the buffer string at least has to be.

If format_to_string is renamed to format, then it should be more like a formatter, and you need to consider "(1000X, g0)";

1. Data format edit descriptor: I, B, O, Z, F, E, EN, ES, D, G, L, A
2. Control format edit descriptor: T, TL, TR, X, S, SP, SS,:, /
3. Character string format edit descriptor: H, ``, ""

If it is renamed num2str, its main function is to convert other variables into characters, mainly involving data format edit descriptors, and SP is not necessary.

IMO, also to_string should be moved from stdlib_ascii to stdlib_string.

I don't really like a name which containts string (the user could think that the result is a DT string.

If stdlib_ascii(module):to_string(function) is to be migrated to stdlib_strings(module) in the future, I would prefer to name format_to_string as num2str, to_chars.

@zoziha

This comment has been minimized.

@awvwgk
Copy link
Member

awvwgk commented Aug 2, 2021

We rejected to_char in the original discussion about the naming of to_string, because we are dealing with strings rather than single characters here. Similar considerations are relevant for num2char. An issue with num2str is that it can format a logical value, which is not strictly a number. Therefore, I think while short this name can be misleading about the provided functionality.

I'm still fine with naming it format_string, also format or print work fine with me. We can also overload the existing interface to_string and make the format specified mandatory. Those suggestions have been made, now it is up to the OP to make a decision and adopt the patch accordingly.

Whatever name we arrive with in the end, I'm still in favor of moving this patch forward.

zoziha added 2 commits August 3, 2021 22:53
…module):format_to_string(interface)`.

#### Works
1. merge `to_string` and `format_to_string` to `stdlib_strings`, make the name `to_string`
2. redrect the used keyword `to_string` in `stdlib_string_type`
3. add `stdlib_string_type_constructor.fypp` to avoid the ring dependence in `stdlib_strings` and `stdlib_string_type`.
4. other clean works.
@zoziha
Copy link
Contributor

zoziha commented Aug 3, 2021

Thanks, I canceled the idea (see comment) merge to_string to format_to_string because I think this task is too large and complicated.
I heard that you also intend to migrate to_string to stdlib_strings. I think it is the best way to merge to_string and format_to_string into stdlib_strings. Their combined name can be called to_string.

I completed this task in another branch (see implementation). It's true that the task is a bit heavy.
If you think there is no problem with this merge idea, I will merge that branch (merge_tostring) into this PR tomorrow. 😏

A note

(see stdlib_string_type_constructor.fypp)
I had to take the string_type_constructor routines out of the stdlib_string_type.fypp because of the ring dependence of using to_string in string_type_constructor routines and using some string_type related routines in stdlib_strings.fypp.

@zoziha
Copy link
Contributor

zoziha commented Aug 4, 2021

I merged to_string and format_to_string to stdlib_strings(module):to_string(interface), which looks like GitHub-CI has no error report, nice~
Please review it, thanks @awvwgk @St-Maxwell 😘.

logical :: stat
character(len=:), allocatable :: msg

if (merge(partial, .false., present(partial))) then

This comment was marked as resolved.

@milancurcic
Copy link
Member

In a nutshell, these are now specific functions that go in addition to the previously existing to_string for integers, and they're now all under one generic name to_string. That sounds good to me. I built and ran tests with CMake and manual Makefile locally without issues.

@milancurcic
Copy link
Member

If there are no objections, I'll merge this tomorrow.

@milancurcic
Copy link
Member

Thanks a lot, @zoziha!

character(len=buffer_len) :: buffer
integer :: stat

write(buffer, optval(format, "(g0)"), iostat=stat) value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@St-Maxwell @zoziha @awvwgk

I apologize for the last minute suggestion. What do you think about the following:

Suggested change
write(buffer, optval(format, "(g0)"), iostat=stat) value
write(buffer, "(" // optval(format, "g0") // ")", iostat=stat) value

Whether the format dummy is present or not, this encloses the format string in an extra set of parentheses. This makes the function slightly more flexible. For example, you can now do:

print *, to_string(1.2345, 'f4.2') ! prints '1.23'

which would previously not work. I don't know why Fortran requires parentheses around format strings, but it's always been one of those little annoying things to me. And if you already have an outer set of parentheses, the semantics don't change. I think this allows for a better UI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit, a format string without parenthesis looks slightly off to me. But I have no strong feelings about this either way.

The only issue I could see, is that it might be surprising for a new user if the intrinsic transfer routines require a different syntax than the standard library. But maybe this is a good thing as well and could be used to suggest a simplification of format strings in the next Fortran standard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we decide to do this we should document it clearly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without parentheses, it looks more concise. I think it's good, as long as we explain it in the document: We use parentheses by default inside to_string to help users write less.
I will confirm the feasibility of this writing later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just made a suggestion in #444 (comment).


- `value`: Shall be an `integer/real/complex/logical` scalar.
This is an `intent(in)` argument.
- `format`: Shall be a `character(len=*)` scalar like `'(F6.2)'`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zoziha @awvwgk Suggestion for docs improvement, to go together with #444 (comment).

Suggested change
- `format`: Shall be a `character(len=*)` scalar like `'(F6.2)'`.
- `format`: Shall be a `character(len=*)` scalar that contains the edit descriptor to format `value` into a string,
for example `'(F6.2)'` or `'(f6.2)'`. `to_string` will automatically enclose `format` in a set of parentheses,
so passing `F6.2` or `f6.2` as `format` is possible as well.

@zoziha
Copy link
Contributor

zoziha commented Aug 22, 2021

I tried to change the corresponding content of format argument to support f6.2 format that does not require parentheses.
Yes, it is also compatible with format like (f6.2) with parentheses!

This is user-friendly and a practical change. @milancurcic Thanks a lot!

@milancurcic
Copy link
Member

Indeed, Fortran allows formats like this: (((((f6.2))))) :)

@milancurcic milancurcic merged commit 5089a40 into fortran-lang:master Aug 22, 2021
@zoziha zoziha deleted the zoziha/feature/format_string branch August 23, 2021 02:09
@awvwgk awvwgk mentioned this pull request Aug 23, 2021
10 tasks
@awvwgk awvwgk removed the reviewers needed This patch requires extra eyes label Sep 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants