-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Array of strings #24
Comments
While I sympathize, the nature of Fortran arrays really does require that the elements all be the same type and with the same type parameters. An unfortunate, but common, design pattern for Fortran is the "wrap it" so that you can treat it consistently with other types. Here one is wrapping a string to support treatment in an array, but I've also wrapped arrays so that they can be treated with CLASS(*) in a consistent manner. For the specific purpose at hand, you could consider the StringVector implementation from gFTL-Shared (https://github.com/Goddard-Fortran-Ecosystem/gFTL-shared) |
Wouldn't it be enough if the Fortran Standard enforced a standardized derived type
I guess, this should not require many changes. Edit descriptors may need additional thoughs, probably. |
It will be really, really nice if the Fortran standard offered something INTRINSIC that is far more well-featured (e.g., arrays of strings with different lengths) and less verbose for coders than "CHARACTER(LEN=:), ALLOCATABLE" or its encapsulation in a derived type!! Converse to "You had me at Hello", the number of instances where "You LOST me at CHARA.." when it comes to someone new looking at Fortran and not liking it is quite long. It's year 2019 after all, operations with "STRINGS" are so basic and intrinsic in programming, a coder should simply be able to do:
and so forth either along the above lines or something similar. As to how the "string" type is implemented in a processor - whether as an intrinsic derived type or some other compiler "magic" - doesn't matter. The thought process of steering the practitioners of Fortran toward a library solution for something as simple and fundamental as a STRING type appears a grave disservice, this should be a low hanging-fruit to offer to the coders. The several available library solutions out there including toward the so-called "Part 2" of the standard with ISO_FORTRAN_STRING has long proven the use cases and the feasibility, the challenge now is mainly in standardizing the set of operations to be offered and their names e.g., 'push_back' or 'append'; 'remove' or 'replace', etc. |
My preference would be an intrinsic But... I don't want the committee to provide some small set of other routines (upper, lower, or whatever) and then not update them for 20 years. What if it was possible to allow the intrinsic type to be extended? That way the user community could develop a library that has everything anybody would need (e.g., regular expressions, parsers, ... things that would never be added by the Fortran standard). A bonus would be some sort of backward compatibility with |
One approach is to do something like https://en.cppreference.com/w/cpp/string/basic_string You can see some of the methods are since C++11, C++17, C++20, and so on. Related to this is that I think Fortran needs to have a standard every 3 years, just like C++, precisely to prevent not updating a feature for 20 years. I opened a new issue for this at #36. |
I agree that having a new intrinsic type STRING would be of considerable benefit. Vendors are usually somewhat reluctant to make changes in the internal type system of their compiler, so there may be some debate about the benefits vs costs. This is esp. true since a user defined type could give most of the benefits without changing the standard at all. (Of course, I think I can still demonstrate compiler bugs with such wrappers on most extant compilers ...) Failing an intrinsic type, it would be good if we could establish an aux Fortran library for such things so that my String wrapper and your String wrapper are compatible. My current approach to such things is to have very small GitHub projects for isolated capabilities, but a better solution might be a large collection of agreed-upon functionality. Various groups have something along these lines, but we'd want to consolidate the effort under a single umbrella. This would evolve faster than the standard but would still require some governance structure. |
Fortran derived types are not capable of what C++ or Python types are. And they were never designed to be that. So I would be against any suggestion such as "implement strings/containers/.... by derived types". Especially that Fortran character is pretty good as it is. Instead I think some more character manipulation functions, python-like str() and regex capability would be all one needs. Also handling them intrinsically by the Fortran library or using native libraries (in case of regex) would make it way more efficient than clunky Fortran implementation. Dominik |
Yes, my suggestion of @tclune and I had long discussions of "being part of the language", versus "a library as in C++ or Python". For Fortran, I am in the "part of the language" camp, and Tom is more open to the "library approach". |
@gronki wrote:
I don't quite agree with this. Besides, Fortran has always granted certain allowances to intrinsics, whether types - basic/derived, statements, procedures,- that are not feasible with user written code. Processors can then implement whatever "magic" they think appropriate behind the scenes to support the facilities. Examples include generic intrinsic procedures, derived types for interoperability with C, etc. Thus there is no reason why an intrinsic STRING type cannot be introduced featuring considerable benefits for the practitioners of the language, and which should be the overriding motivation, in a manner that might at first appear alien to a user derived types e.g., an inextensible, single-component only container type for a new intrinsic string type which offers an "(x:y)" operator which aren't possible today with user derived types! On the other hand, the problem with CHARACTER type is it's really at the point of being immutable, any further change to it can adversely impact some compiler implementation or other. The resistance to any change here will be great. But also, there is the issue of ARRAY semantics in Fortran which, per its original design, calls for symmetry and shies away from jagged structures. So this is yet another issue when it comes to user needs for arrays of strings. Thus, the best compromise in my opinion is a new string type which builds on the capabilities of current CHARACTER type but also offers further benefits for coders. |
@certik To clarify: "in-the-compiler" is generally better that "in-a-library", I'm sure we can all agree. Rather the issue is that if you want everything in the compiler you are going to be disappointed many many times due to finite resources. The number of developers that could/would contribute to an open library far exceeds those that can/will contribute to the standard and esp. any commercial compiler. Perhaps, just perhaps, flang will emerge and create a thriving ecosystem of active branches of development of new (well thought out) features, but even then I don't think the basic balance will change much. So, given the finite resources for making changes to the standard and the commercial compilers, I want to focus on things that fundamentally cannot be done with user code. Of course there are grey areas, and I'm by no means an absolutist. An intrinsic String type would likely rise to my 2nd tier of language priorities. (2nd tier things often happen precisely because the 1st tier things are too hard/controversial.) |
@tclune wrote:
A point to keep in mind is what is needed to "grease the wheel to make the sale" i.e., to get Fortran considered even as an option for new projects or for refactoring of existing code-bases. An intrinsic string type appears to fall under this category. Being able to code "string s = 'Hello World!'" conveys certain sense of ease-of-use and modernity which is not quite measurable but which I think is priceless. One just loses many a sharp mind at "character(len=:), allocatable ..". One can then build many a "field of dreams" with coarrays (an enormous amount of committee resources were expended on it), etc. in Fortran, but "they" rarely come to experience any of that .. One can accept the argument Fortran does not necessarily need to include as a top priority feature "in the compiler", say, containers of the likes of C++ STL like deques and priority_queues and unordered multimaps, etc. A vision for Fortran to offer better intrinsic capabilities (e.g., generics) at some point so coders can "home brew" their own libraries toward such capabilities may be ok right now. But with a string type, an intrinsic capability is badly needed for Fortran's credibility. And it doesn't seem all that difficult, almost all the heavy-lifting was already done with Part 2 of the standard with ISO_FORTRAN_STRING. Part 2 is now effectively gone. It's mostly a matter of merging into Part 1 a modernized version of it. However if every item, no matter how well-established in programming parlance, becomes too difficult to get added to Fortran and requires endless discussions and iterations on use cases, requirements, specifications, and syntax, then perhaps it's time for someone to contact INCITS and "pull the plug" on Fortran. |
I wholeheartedly disagree. We will end up getting 100 clunky and
incompatible implementations of string and list whereas there should be
exactly one way to handle strings and one way to handle lists. Let me bring
the example of Python, where PyPi is a real world example of thriving
ecosystem of user packages. Even though it is an extremely flexible
language (more than Fortran), the most essential types (string, file, list,
...) are the basic types of the language. There is no way in Fortran to
implement a generic list, and there will be no such possibility before
actual C++-like templates come (which is uncertain and not everyone even
agrees about it). Implementing a linked list in compiler library is 1000
times easier than templates that practically turn the whole language upside
down.
I agree that character(len = :), allocatable ... is clunky and probably
would be easier aliased by just "string" (which is an easy thing to
implement as is just a matter of syntax), but when it comes to properties
it's pretty damn good. You can cut it and concatenate on the fly. Python
strings are immutable too, same about Java. In Fortran we just lack the
standard utilities to manipulate them (format, regex, etc).
I think community packages are great for solving particular complicated
problems, but when you need to install an user library to handle a list or
string, it shows -- as FortranFan said -- lack of language reliability and
credibility.
Dominik
pon., 28 paź 2019 o 20:22 Tom Clune <[email protected]> napisał(a):
… @certik <https://github.com/certik> To clarify: "in-the-compiler" is
generally better that "in-a-library", I'm sure we can all agree. Rather the
issue is that if you want everything in the compiler you are going to be
disappointed many many times due to finite resources.
The number of developers that could/would contribute to an open library
far exceeds those that can/will contribute to the standard and esp. any
commercial compiler. Perhaps, just perhaps, flang will emerge and create a
thriving ecosystem of active branches of development of new (well thought
out) features, but even then I don't think the basic balance will change
much.
So, given the finite resources for making changes to the standard and the
commercial compilers, I want to focus on things that fundamentally cannot
be done with user code. Of course there are grey areas, and I'm by no means
an absolutist. An intrinsic String type would likely rise to my 2nd tier of
language priorities. (2nd tier things often happen precisely because the
1st tier things are too hard/controversial.)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24?email_source=notifications&email_token=AC4NA3IYYDGZXX62BEIGHJDQQ43YLA5CNFSM4JCNV2LKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECOC4BY#issuecomment-547106311>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC4NA3IP53JFRUMU7ZJWKKTQQ43YLANCNFSM4JCNV2LA>
.
|
@gronki I think you have a point --- as an example, C++ is a language that has much better features than Fortran to develop libraries for basic things. And so they do not have an array in the language (the idea was that C++ allows enough abstraction to implement your own), and as a result, C++ has dozens of arrays libraries, all incompatible with each other. They do have However, I do agree with @tclune's point that it is by far easier to create a library than to get something in the language. |
@certik should we continue to discuss a potential intrinsic string class here, or make a new issue specifically for that? That would solve the array of strings issue, but really it's a more general topic that touches on other issues. |
Let's create a new issue for it. I agree a string type is more general.
…On Sun, Nov 24, 2019, at 7:42 AM, Jacob Williams wrote:
@certik <https://github.com/certik> should we continue to discuss a
potential intrinsic string class here, or make a new issue specifically
for that? That would solve the array of strings issue, but really it's
a more general topic that touches on other issues.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24?email_source=notifications&email_token=AAAFAWCVJXPSGSU56FDJXKDQVKHEPA5CNFSM4JCNV2LKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFAM3FI#issuecomment-557895061>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAFAWC5O6MMZ6R33VGMTTDQVKHEPANCNFSM4JCNV2LA>.
|
I ended up implementing my own version of the ISO_VARYING_STRING module for a couple of reasons. gfortran still hasn't implemented it, even though it's been in the standard for quite a while now (I don't know that any other compiler has either). I hadn't seen a particularly complete, well tested, or well written version. There are significant bugs in most implementations of allocatable characters, not the least of which is that you still can't put them in an array even if they weren't buggy. I was under the impression that since ISO_VARYING_STRING had been approved, it was the right way to move forward. If you use a standards compliant version, if anybody ever does actually implement it in the compiler, you won't have to change any of your code, the external library will just become unnecessary. I also thought that standard covered all of the intrinsic procedures accepting or returning characters, and any additional procedures that were truly necessary. Granted, not every string functionality one would want is included, but that's what libraries are for. |
Hey @tclune, we have tried to make something along these lines over at the stdlib repository. For the start we converged to a non-extensible As a second step, @awvwgk provided a demonstration of an abstract base class (ABC) for a string object, already demonstrating wrappers to ftlString and StringiFor based upon the ABC. Some discussions related to the (problems of the) ABC are in:
I don't think we can get much further than this within the scope of the current standard. |
As I understand, the ISO_VARYING_STRING was withdrawn, see https://www.iso.org/standard/6129.html. I'm not sure if dropping the ISO prefix would cause more or less confusion at this point. |
Yes, it was withdrawn, but I didn't know that at the time. I'm still a fan of the specification, even if it isn't in the standard. IMO, it's proven itself to be quite useful. |
This would be great for a blog post, looking at the story of this module, and doing some comparisons with the |
That's a good suggestion. I've got lots going on, but hopefully I can get to it before too long. |
I really don't understand the efforts of implementing a new string type. The current (allocatable) strings are rather flexible and serves at least my purposes very well. |
I've not followed this thread closely, but I think the distinction is that a proper String would encapsulate the allocatable aspect. Without that any attempt to define something like a list would require a wrapper type due to Fortran's quirky syntax in this regard. I.e., a list of Strings would be a virtually identical implementation to a list of Integers. While a list of deferred length CHARACTER(len=:), ALLOCATABLE would require a wrapper type and lots of boiler plate logic for diving down the extra level in the data structure. |
@tclune: If you neglect that you need to define the type, string handling is not much different from python: program test_string
character(len=:), allocatable :: string
string = 'hello world'
print*, len(string)
string = string//', hello Github'
print*, len(string)
print*, string(14:)
string = ''
print*, len(string)
string = ' '
print*, len(string)
end program test_string
#/usr/bin/env python3
string = 'hello world'
print(len(string))
string = 'hello world'+', hello Github'
print(len(string))
print(string[13:])
string = ''
print(len(string))
string = ' '
print(len(string)) The boiler plate comes in when you want to define a collection of strings, but IMHO this is due to the lack of a list type that can contain arbitrary types/kinds. |
Introducing such generic programming capabilities is my highest priority, and the primary reason I joined the Fortran committee in the first place. And the ability to use that for containers is my most important use case. I don't want to overpromise on the schedule, and there are of course risks that such a big feature can fail to get the necessary votes. |
It also may be worth pointing out that even if Fortran were to introduce something like a List container that can contain "anything" it would be a bit difficult to use. The dynamic typing in languages like Python significantly improves the usability of such a structure. For instance, suppose we want to retrieve the 5th element from a list real :: x
...
x = L(5) How does the compiler check that the types agree? What should it do if the types do not agree at run time? You might respond that class(*), pointer :: x
...
x => L(5) That can work (with some less interesting caveats), but ... Basically any attempt to make containers as incredibly flexible as they are in Python, will be very problematic in Fortran. C++ STL is probably a safer guide to what can work in Fortran. It does allow lists of void pointers, but requires casting. Most STL containers are declared for specific types. |
Static typing clearly results in these constraints. I'm not very familiar with C+ STL, but to me it seems like a 'typed list'. A list of type I don't know what STL is usually used for, but from my python experience a list usually contains similar things. So I expect that a STL-like container will rarely result in long |
Yes - that was what I was trying to say ... Usual cases involve concrete types or pointers to base types if polymorphism is desired. But even then C++ STL has the same memory footprint for each object, and does not readily handle items of variable size. This allows direct computation of memory offsets ala how Fortran manages array indexing. One gets around that for say varying length strings, by having a standard string template that effectively creates a wrapper type that has an allocatable character array inside. Which gets back to what the original discussion was requesting, if I understood. Whether Fortran accomplishes this with the new generics facility or does something special just for the case of strings, is a useful topic. My hope is that the former is powerful enough to obviate the need for the latter. But early days ... |
I would actually not agree, introducing features of the language just so
that you can implement a object to represent a thing you actually need is
what c++ does. I would suggest to just use C++ for such projects. Fortran
does not allow much of an encapsulation or efficient OOP, it will not for
many more years and it doesn't even need to, so I would argue it's far from
optimal route and it would be much better to have built in types for
containers, just like allocatable character provides a good enough string
functionality. Not to dismiss any efforts in creating user space libraries,
it's awesome that the community takes the initiative on where the standard
is lacking and far behind, but my personal preference is to use what is
already in the language, mostly because it just works. Fortran is not a
flexible language like C++ because it's user target is different. I think
that introducing a couple of popular data structures and routines that
would be very well optimized by a compiler under the hood seems like much
closer to what Fortran aims at. Python is much more flexible than Fortran
yet it does have data structures built in without the need to reinvent the
wheel.
Just my opinion!
Wishing you all good health
Dominik
wt., 8 cze 2021, 18:25 użytkownik Peter Klausler ***@***.***>
napisał:
… I really don't understand the efforts of implementing a new string type.
The current (allocatable) strings are rather flexible and serves at least
my purposes very well.
What is missing IMHO is a list type that can be used as a collection of
arbitrary types. Strings of different length could be simply put in a list.
That's how it is done in Python, where the a numpy string array is be also
limited to strings of the same length.
Instead of adding lists, maps, sets, &c. to Fortran, the language really
needs the basic generic programming capabilities other languages have had
since the 80's with which such data structures can be easily built. Maybe
in this decade that will happen in the standard language, and then
implementations will follow in the decade after.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC4NA3IWGHOUDNHBAYDDYE3TRZABBANCNFSM4JCNV2LA>
.
|
More debate of this topic here: https://fortran-lang.discourse.group/t/how-do-i-allocate-an-array-of-strings/3930/27 In stdlib we have a string type which provides most if not all the same functions as the built-in character type and can also be used for an array of strings:
The main places where some friction exists are: initialization, I/O, and conversion to the built-in character type. In addition we have a |
A common request is to simplify how to do array of strings. Currently one option is this:
It might be worth considering if there should be some way in Fortran to do this directly without the derived type.
The text was updated successfully, but these errors were encountered: