Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derived type naming convention #225

Open
aradi opened this issue Jul 26, 2020 · 31 comments
Open

Derived type naming convention #225

aradi opened this issue Jul 26, 2020 · 31 comments
Labels
API Discussion on a specific API for a proposal (exploratory) meta Related to this repository specification Discussion and iteration over the API

Comments

@aradi
Copy link
Member

aradi commented Jul 26, 2020

Several PRs (#201, #221, #224) wishes to introduce derived types into stdlib. We need a name convention for them. The conventions I have met in Fortran code so far, are the following ones:

  1. Singular noun, such as type(os_error), type(bitfield). Pro: compatible with Fortrans naming convention (e.g. type(c_ptr)). Con: You reserve a name, which would be also very natural for an instance variable, e.g. type(bitfield) :: bitfield does not work.

  2. Plural noun, such as type(os_errors) and type(bitfields) as suggested for example in API for a bitset data type  #221. Pro: You can give the corresponding singular name to the instance variable: type(bitfields) :: bitfield. Con: All languages I know use singular for type/class names, so it may feel strange an unnatural for stdlib-newcomers.

  3. Singular noun with a _t suffix, such as: type(os_error_t), type(bitfield_t). Pro: You can use the noun without the suffix as instance variable, e.g. type(bitfield_t) :: bitfield. Con: The extra _t is redundant.

I am tending towards option 1. with the additional restriction, that derived type should always contain at least two nouns (connected by underscores). Then, the corresponding variable instance name could be still exactly the same, but without the connecting underscore, e.g. type(bit_field) :: bitfield or type(os_error) :: oserror.

Any opinions on this?

@everythingfunctional
Copy link
Member

I lean towards option 3. It may be redundant, but it helps not clutter up the namespace. It would also be consistent with what I would think would be a decent convention for using the _m suffix for modules and the _i suffix for abstract interfaces. I think there will be too many instances where coming up with a second word for the type would be very awkward/unnatural.

@certik
Copy link
Member

certik commented Jul 26, 2020 via email

@nncarlson
Copy link
Member

nncarlson commented Jul 26, 2020

This is a big pet peeve of mine. Mangling type and module names with something like _t and _m is just so dumb to me. The only place you encounter a type name is in a type statement (or select type) where it's abundantly clear that the name is a type. Similarly, module names are only encountered in a use statement. They are pointless appendages. True that a variable name must differ from a type name, but mindlessly mangling one so that you have great freedom in the other isn't necessary. One can be more creative about variable naming. If you have a module that defines a bitfield type, then an actually useful name for the module is bitfield_type as it tells you what the module provides.

PS: So I'm definitely in the option 1 camp.

@milancurcic
Copy link
Member

I prefer 1 for the same reason as Neil.

@milancurcic
Copy link
Member

I'm now at a keyboard so I'll elaborate more why 1.

First, I do appreciate not wanting to clutter the namespace and I've had the same dilemma. However, in what scenarios would you want to name the type instance the same as the type itself? The only ones that come to mind are toy examples in tutorials or types intended to be used as singletons.

However, in real-world code this is not common. Why call a string instance string, a datetime instance datetime, or a bit field instance bitfield? Types and their instances (variables) don't live in the same semantic space. Types are more abstract, their instances more concrete. If you have to argue one way or the other, you'd want the names to reflect the different semantic meaning of types and instances.

Second, does the value of giving the user more freedom in naming variables outweigh the cost of the ugliness of _t throughout the code? The more a type needs to be referred to by name the worse it gets. Consider an extreme example of type(string_t), which I assume would be used a lot in client code. The pro is that the user has the freedom to have a variable called string. The con is that a significant fraction of future Fortran library ecosystem will be pepper-sprayed with the ugly type(string_t). If stdlib is to be successful, and I think all of us here are doubling down on, then how we name derived types in stdlib has a huge impact on the aesthetics of future Fortan libraries and applications.

I assumed that type(string_t) is uglier than type(string), and that wanting to name your instance the same as your type is an edge case rather than common use. If these assumptions are true, then we should prioritize giving nice names to our types over giving marginally more flexibility to client code. If the user insists to name their variable string, then they can do:

use stdlib_string, only: string_t => string
type(string_t) :: string

I think it's okay to make an exception to the rule when justified. If an appendage to the type name is necessary, then I think _type is less ugly and more clear than _t.

@everythingfunctional
Copy link
Member

I'm not opposed to 1, and you're right that it is noise. I just think it does at least occasionally lead to name clashes due to the design of the language.

For example, in a library for dealing with the composition of matter, I invariably have a type element. I'd like to define that type, and related operations, type bound procedures, etc. in a module element, and in another module dealing with chemicals, I'm likely to have variables that would be most appropriately named element in the procedures. In this instance, element_type would not be an appropriate name for the module, as it provides more than just the type. And trying to come up with a different name for the variables just to avoid the name clash with the type leads to more noise than the suffix (i.e. the_element).

However, I will grant that this is not likely to be the majority case, and probably not likely to be a frequent occurrence in stdlib. Also, for the very generic use case of stdlib types, I agree with @milancurcic point about variable names with the same name as the type being unlikely. So, in that case I would be fine with option 1, and not bother worrying about the two word requirement for types.

I will note however, that for @milancurcic example, had it not been for the stdlib namespace, he would have demonstrated exactly my point about the module name.

@certik
Copy link
Member

certik commented Jul 27, 2020 via email

@Romendakil
Copy link

Definitely not CamelCase! We want Fortran style, not C++ or Python. In our code we are using option 3, so this looks very familiar to me. But I think it is just a matter of taste.

@aradi
Copy link
Member Author

aradi commented Jul 27, 2020

Although we are using camel-case in some of our projects, I won't recommend it for stdlib for several reasons:

  • The one derived type already in the language type(c_ptr) uses the lower-case + underscore convention. (Are there any other derived types inthe standard?)

  • Since we have a case-insensitive language, people will start to use the type names differently without being notified by the compiler e.g. type(OSError), type(OsError), type(OSerror). I think, the safest convention to ensure consistent usage for a case-insensitive language is to consistently write everything lower cased. (The temptation to use type(OS_error) instead of type(os_error) is hopefully smaller.)

@arjenmarkus
Copy link
Member

arjenmarkus commented Jul 27, 2020 via email

@MarDiehl
Copy link
Contributor

Even though I agree with @milancurcic that naming variables after their type is bad style ( type(string) :: string is a good strategy to confuse maintainers) , I would also opt for a _t or _type suffix or prefix. Since Fortran is case insensitive (CamelCase is not an option) and does not support multiple namespaces, the namespace is already quite small and we need to avoid conflicts.

Somehow related to the question of naming conventions for variables is the naming convention for modules. For class-like types, it could be meaningful to have a module that contains one type only (see e.g. https://github.com/MarDiehl/quaternions). Would the module name in that case be the name of the type/class with some variation?

Examples

use stdlib_list

type(t_list) :: names

Alternatively (multiple types/classes in one module):

use stdlib_types

type(t_list) :: names
type(t_dict) :: children

There is also the point that a type can be something like a C struct (just a collection of variables) or a python Class (object oriented approach with type-bound procedures). I don't think it makes sense to differentiate them (e.g. string_c because it has type bound procedures and is considered to be a class)

@shahmoradi
Copy link
Member

shahmoradi commented Aug 2, 2020

I agree with @MarDiehl. Just sharing my experience and my trial and errors on this topic: This is the design that I settled with, in my personal projects after several years of try and improve:

module string_mod
    type :: string_type
    end type
end module string_mod
program main
use string_mod, only: string_type
type(string_type) :: string
end

In practice, I have found that I use the variable string far more frequently than the suffixed type and module names (string_type, string_mod). So it has made sense for me to suffix the type and module names and have freedom in choosing variable names, instead of inventing and living with awkward variable names. Other languages that do not use these suffixes (like Python or MATLAB) are case-sensitive, so they can define something like the following,

from string import String
string = String()

But this is (, perhaps, fortunately) impossible in Fortran. An alternative that comes to my mind for Fortran is,

module string_mod
    type :: string
    end type
end module string_mod
program main
use string_mod, only: string_type => string
type(string_type) :: string
end

But this would be highly inferior to the former method of suffixing types with _type. It takes more time and energy to achieve the same goal.

In my opinion, there is nothing wrong with being expressive and clear in naming conventions, for example in choosing string_type vs. string_t or vs. string as the type name.

Regarding the CamelCase, it has been my own strong preference everywhere, even in Python. But in the special case of Fortran stdlib, I think it would make more sense to follow the naming convention of the standard Fortran, which is the snake_case.

If you are interested to see how this naming convention looks and feels in practice, take a look at this example module here: https://github.com/cdslaborg/paramonte/blob/master/src/ParaMonte/String_mod.f90

cheers

@FortranFan
Copy link

FortranFan commented Aug 3, 2020

For whatever it's worth, I too prefer "_t" suffix for derived types in Fortran; "_m" for modules; and "_sm" for submodules.

I disagree with @milancurcic 's comment earlier, "in real-world code this is not common. Why call a string instance string, a datetime instance datetime, or a bitfield instance bitfield"

  • Test-driven development (TDD) and prototype implementation and all modes of testing starting from unit tests are of utmost importance to "real-world code", so much of illustration and the accompanying communication, collaboration, and brainstorming, also the application enhancement which follows, and the advancement of associated technology takes place using the small programs associated with these critical needs.

The teams I work with and I have found it to be tremendously helpful and productive to name an instance of a 'foo' type as foo itself which is accomplished by the simple type(foo_t) :: foo in such small programs.

So much so that some teams have carried that forward (or one can say brought back since _t had started to appear in C-based languages during the 1980s i.e., before Fortran 90 and flexible naming was introduced in Fortran) to other languages such as C# that are case-sensitive where the usual practice, as mentioned upthread, was to use some case convention (e.g., camelCase) but which was found to be a struggle for some developers with special visual needs.

@milancurcic
Copy link
Member

For whatever it's worth, I too prefer "_t" suffix for derived types in Fortran; "_m" for modules; and "_sm" for submodules.

I think this is worth the most--this thread asks what each of us prefers.

Yes, TDD leads to some real-world code, and type(foo_t) :: foo has as much to do with TDD as type(foo) :: a. Different teams have different habits and styles.

Considering a stronger preference so far for a suffix to type names, does anybody object to the suffix being _type over _t? I'm worried that the latter may be quite opaque to newcomers to the language.

@jvdp1
Copy link
Member

jvdp1 commented Aug 3, 2020

I would opt for the option 3 (i.e., '_t' or '_type' as suffix). I would even prefer them as prefix, as typing e.g., t_ or type_ in the editor would result in proposing all the types already used in the file.

I am quite opposed against plural nouns. There are probably some cases where using a plural noun would make no sense, and these situations could lead confusions.

Considering a stronger preference so far for a suffix to type names, does anybody object to the suffix being _type over _t? I'm worried that the latter may be quite opaque to newcomers to the language.

@milancurcic Would you still be worried if this convention is mentioned clearly in the docs?

@shahmoradi
Copy link
Member

@milancurcic I agree with you that _t is somewhat opaque to beginners compared with _type. and for that reason, I personally prefer the latter. I think being expressive and clear is more important than being concise. But as you said, these issues are mostly personal preferences.
cheers

@milancurcic
Copy link
Member

@milancurcic Would you still be worried if this convention is mentioned clearly in the docs?

Yes, we'd document any convention in the docs and style guide and this helps for sure. I'm concerned more about what happens in the first 0.5 s or so when your eyes read string_t--there's an extra brain cycle to map "t" -> "type". Of course, if you've been used to it, it's no issue. It trips me up every time, but as I read the code for a while I get used to it. I don't see this paradigm outside of Fortran, thus the newcomers concern.

@aradi
Copy link
Member Author

aradi commented Aug 3, 2020

As for me, I am in favor of _t over _type if we go for the suffixed version. Fortran is usually very (way too) verbose and typing intensive, but at least in the naming of type(c_ptr) (the only intrinsic derived type in the language I am aware of) it happened to be compact. As majority here seems to support a naming convention which is more verbose than the intrinsic one (option 3 instead of option 1), let's try to keep it at least as compact as possible.

Fortunately, for modules we do not need any suffixes, as the namespace-prefix makes it highly unlikely that module names do not collide with type names or variable names. So, I think, if at all, we should suffix type names only.

@jacobwilliams
Copy link
Member

I like _type and _module much better than _t and _mod.

@Romendakil
Copy link

Romendakil commented Aug 3, 2020

We always use _t in our code, this seems much more natural to me.

@MarDiehl
Copy link
Contributor

MarDiehl commented Aug 3, 2020

Regarding _mod/_module/_m: Is this convention used 'in the wild'? Neither the intrinsic modules (ISO_C_Binding, IEEE_arithmetic, iso_fortran_env) nor the libraries that I use (hdf5, petsc, MPI) have such a suffix.

@milancurcic
Copy link
Member

Regarding _mod/_module/_m: Is this convention used 'in the wild'?

I used _mod in the past to allow naming a type with the name of the module base name.

Module suffixes don't pertain to stdlib because we prefix modules with stdlib_.

@shahmoradi
Copy link
Member

shahmoradi commented Aug 3, 2020

@milancurcic since the community's opinion on the matter of naming convention appears to be fragmented, do you think a poll could resolve this issue of naming? perhaps a poll with an extra question on the years of experience of the participant with Fortran, so that the answers could be weighted based on experience.

@wclodius2
Copy link
Contributor

wclodius2 commented Aug 4, 2020

@septcolor the language was case insensitive because early computers used only six bits to represent characters . With only 64 character codes, 10 code points for digits, about 10 code points for special characters, and a few code points for control codes there weren't enough code points available for both upper and lower case.

@aradi in addition to C_PTR the standard also identifies C_FUNPTR, in ISO_C_BINDING, LOCK_TYPE and TEAM_TYPE in ISO_FORTRAN_ENV, IEEE_FLAG_TYPE, IEEE_MODES_TYPE, and IEEE_STATUS_TYPE of IEEE_EXCEPTIONS, IEEE_CLASS_TYPE, and IEEE_ROUND_TYPE of IEEE_ARITHMETIC, and IEEE_FEATURES_TYPE, of IEEE_FEATURES as derived types. Eleven derived types in all if I have not missed any.

@zerothi
Copy link

zerothi commented Aug 4, 2020

To repeat already opinions:

  1. I prefer _t it allows me to re-use the same variable as the type. In very many cases I only need a single object and I wan't clarity in code. type(name_t) :: name makes it clear what name is. In our code bases this is coming across quite frequently.
  2. I prefer the file name implementation.f90, in it, module implementation_m for module. This allows me to have a subroutine subroutine implementation. This is quite frequent for small self-contained modules that only expose one method. The file name clarifies intent, and so does the module name. I.e. _m suffix.
  3. I agree with @aradi that stdlib should refrain from using camelcase. Whether it be all upper or all lower case with _ separators is not really important to me. It seems to me that users of stdlib may use the names as they like so it fits their coding conventions. E.g. code bases which relies on all upper case may still use stdlib as such.

@zerothi
Copy link

zerothi commented Aug 4, 2020

@milancurcic since the community's opinion on the matter of naming convention appears to be fragmented, do you think a poll could resolve this issue of naming? perhaps a poll with an extra question on the years of experience of the participant with Fortran, so that the answers could be weighted based on experience.

I don't think experience years should count. We want to advocate new comers to the language on an equal footing (they are the inheriting use base!). :)
However, total fortran newbies should be recommended not to vote ;)

@wclodius2
Copy link
Contributor

FWIW I agree with @certik that with the STDLIB prefix there is no need for a _m or _mod suffix to modules.

@jvdp1
Copy link
Member

jvdp1 commented Aug 4, 2020

in addition to C_PTR the standard also identifies C_FUNPTR, in ISO_C_BINDING, LOCK_TYPE and TEAM_TYPE in ISO_FORTRAN_ENV, IEEE_FLAG_TYPE, IEEE_MODES_TYPE, and IEEE_STATUS_TYPE of IEEE_EXCEPTIONS, IEEE_CLASS_TYPE, and IEEE_ROUND_TYPE of IEEE_ARITHMETIC, and IEEE_FEATURES_TYPE, of IEEE_FEATURES as derived types. Eleven derived types in all if I have not missed any.

Thank you @wclodius2 for this information.
Based on this, I am in favor of using _type over _t as suffix, because it is already used in the Fortran standard.

@certik
Copy link
Member

certik commented Aug 4, 2020

Just like with the indentation convention, let's simply document the most viable approaches here, so that people can choose from it. Let's not enforce any particular approach right now, as it is too early for multiple reasons: our community is very young, and we are still in early stages of developing fpm, which will have a convention for naming modules.

Let's start with modules: the convention for fpm packages (programs or libraries) is that each module is prefixed with the path where it sits on the filesystem, starting with the name of the package. So a module in src/something/bitfield.f90 will have a name stdlib_something_bitfield.f90. If you just have src/bitfield.f90, then it will have a name stdlib_bitfield.f90. The same if you have a program / application, it will be prefixed by the name of your application. This naming convention will be enforced by default by fpm. We can discuss this further, but this is the best so far that we were able to come up that allows combining different packages, and have effective "namespaces", and you just put your files into any directory structure you want and fpm will help with naming your modules correctly (e.g., it could rename your modules on a request). Assuming we will continue with this approach, then we don't need to append any _m or _mod or _module to modules.

Regarding derived types, the options are to append nothing, _t and _type. It looks like people agree to just use lowercase. So let's document these as the 3 options and move on. Later we can revisit as more fpm enabled Fortran libraries will be available and we gain more experience combining and depending on lots of dependencies.

@milancurcic
Copy link
Member

@milancurcic since the community's opinion on the matter of naming convention appears to be fragmented, do you think a poll could resolve this issue of naming? perhaps a poll with an extra question on the years of experience of the participant with Fortran, so that the answers could be weighted based on experience.

Yes, and I think we already have a decent poll in this thread--a separate poll could break the flow we have here. I don't agree with weighing votes by experience. I think everybody's input should count equally, newcomers and old-timers alike.

@wclodius2
Copy link
Contributor

FWIW in regards to type naming, I am now of the opinion that is the type name is long or otherwise inconvenient to be used as a variable name it should not have a suffix, but if likely to be used as a variable name it should have the _t suffix.

@milancurcic milancurcic mentioned this issue Sep 12, 2020
@awvwgk awvwgk added API Discussion on a specific API for a proposal (exploratory) meta Related to this repository specification Discussion and iteration over the API labels Sep 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Discussion on a specific API for a proposal (exploratory) meta Related to this repository specification Discussion and iteration over the API
Projects
None yet
Development

No branches or pull requests