Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File system operations #201

Open
MarDiehl opened this issue May 30, 2020 · 39 comments
Open

File system operations #201

MarDiehl opened this issue May 30, 2020 · 39 comments
Labels
topic: interface Interfacing with other libraries or languages topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ...

Comments

@MarDiehl
Copy link
Contributor

While some functionality for file system related operations exist in Fortran, some rather relevant operations are not standardized.
For example, figuring out whether a path is a directory:
https://stackoverflow.com/questions/9522933

A function like "isDirectory" can be based on the corresponding C-functionality. Would that be an appropriate solution? I would certainly require a bunch of #ifdefs in the C-side of the implementation

@certik
Copy link
Member

certik commented May 30, 2020

This is related to #14. I think we agreed that file system operations are in the scope. The naming convention should probably be is_directory or is_dir, and we should also look at how Python, Julia and Matlab name such functions. We are always trying to be consistent with other languages where it makes sense.

The main goal of stdlib is to figure out and document (in a spec) the API. The underlying implementation is secondary --- stdlib will provide a reference implementation and then compiler vendors are free to provide their own, different or more optimized implementation if they want. The only requirement is that it must run on all platforms, but probably calling into C if we have to would do it.

@milancurcic
Copy link
Member

#22 for POSIX systems is also related.

@MarDiehl
Copy link
Contributor Author

MarDiehl commented Jun 3, 2020

I have thought a little bit about a possible contribution from my side, and I propose to add general path related functionality. Since the other language I am frequently using is python, I got my inspiration from

  1. os.path
  2. pathlib

In python, the object oriented approach of pathlib is often favored over os.path. However, unless I overlooked something, I believe that the os.path approach is better suited for Fortran because one can not chain function calls.

  • Python: p = path.absolute(),is_file()
  • Fortran: p = path_is_file(path_absolute(Str))

With Fortran variable length strings (allocatable), most operations are easily performed. However, there is one exception:
os.path.commonpath(paths) takes a list of strings, in Fortran this would be an array of strings/characters and all of them would need to have the same length. Alternatively, one can use an interface for a series of fpp generated functions with signatures of type path_commonpath(path1,path2), path_commonpath(path1,path2,path3), ...

While most functionality will be implemented in pure Fortran, certain operations require file system functions from C.

There is one more thing I need to mention: If Windows support is needed, someone else has to provide it. I've never compiled a Fortran program on windows and the path operations differ significantly. pathlib has essentially two implementations and I assume we are in the same situation. If someone volunteers, I would prefer that we work in parallel on both implementation. My time budget is about 4h/week and I would hope to finish the whole implementation in 3 months.

@epagone
Copy link

epagone commented Jun 3, 2020

With Fortran variable length strings (allocatable), most operations are easily performed. However, there is one exception:
os.path.commonpath(paths) takes a list of strings, in Fortran this would be an array of strings/characters and all of them would need to have the same length. Alternatively, one can use an interface for a series of fpp generated functions with signatures of type path_commonpath(path1,path2), path_commonpath(path1,path2,path3), ...

Or it can be solved by an implementation of the iso_varying_string module or advanced libraries of string handling routines as summarised in this other issue.

@jvdp1
Copy link
Member

jvdp1 commented Jun 15, 2020

However, there is one exception:
os.path.commonpath(paths) takes a list of strings, in Fortran this would be an array of strings/characters and all of them would need to have the same length. Alternatively, one can use an interface for a series of fpp generated functions with signatures of type path_commonpath(path1,path2), path_commonpath(path1,path2,path3), ...

fypp could be used for that too.
However, implementing and using iso_varying_string module would be the best option IMO.

@urbanjost
Copy link

FYI:

I have some related modules (M_path, M_io, and M_system) in the GPF
(General Purpose Fortran) site on github if you are looking for ideas
on how the API might look from a Fortran perspective.

https://github.com/urbanjost

Instead of the entire GPF, self-contained subsets of two of the modules
are available:

https://github.com/urbanjost/M_io
https://github.com/urbanjost/M_system

The M_path module description (actually all the GPF routines) can be
found in the manpage index:

GPF manpages

I do not have a stand-alone M_path.f90. It is currently only in the GPF
collection, as it uses a number of string routines from M_strings.f90.

@MarDiehl
Copy link
Contributor Author

@urbanjost Nice code, thanks for the hint.

@urbanjost
Copy link

Thanks. No problem. Some is better than others. Seeded it with a lot of code I had around with the hopes of getting a development community started to expand it and clean it up but it did not catch on as I had hoped. stdlib(3f) seems to have far more momentum behind it. If you find anything useful in it feel free to use it for stdlib(3f).

@ivan-pi
Copy link
Member

ivan-pi commented Jul 16, 2020

The Oracle Fortran Library seems to have covered some file system operations: https://docs.oracle.com/cd/E19957-01/805-4942/index.html

Based upon the function names, it looks like it is actually implemented in C. Perhaps it can serve as reference.

Edit: The Absoft Compiler also has similar compatibility libraries - https://www.absoft.com/wp-content/uploads/2015/08/Support-Libraries.pdf

Edit2: Compaq Fortran also had it's own library of C-like functions - http://h30266.www3.hpe.com/odl/unix/progtool/cf95au56/dfumroutines.htm#overview_lib_rout

@ivan-pi
Copy link
Member

ivan-pi commented Jul 16, 2020

Perhaps the Fortyxima project (https://bitbucket.org/aradi/fortyxima/src/develop/fortyxima/filesys/) from @aradi could serve as a starting point?

@aradi
Copy link
Member

aradi commented Jul 16, 2020

If there is interest, I am happy to clean it up a bit, so that it meets the stdlib coding standards. 😉

@certik
Copy link
Member

certik commented Jul 16, 2020

@urbanjost don't feel bad, a lot of us tried to do the same. Thanks for the pointer, I added it to #1, thanks for sharing the link. As you can see there, we list 10 such libraries that people did (mine is there too), and we all tried to get a community started around it. It's extremely hard. But I think we finally succeeded this time with stdlib and with fortran-lang.org. I should write a blog post about this --- Fortran is far from being saved, but just the fact that we managed to get the community together is the first necessary step, and it looked impossible to me just a year ago. And yet I think we succeeded at this first step at this point.

@ivan-pi thanks for the pointers, @aradi I think there will be interested, let's discuss the API.

@aradi
Copy link
Member

aradi commented Jul 16, 2020

Sure, I opened a separate issue for this (#220)

@MarDiehl
Copy link
Contributor Author

I have a first prototype of a library that has functions from pythons os and os.path: https://github.com/MarDiehl/stdlib_os
Pure Fortran where possible, but most file system related operations rely on C routines.
Works on linux with GNU and Intel compilers.
Exceptions in python translate into error stop. Currently without a message, but that can be changed.

@jvdp1
Copy link
Member

jvdp1 commented Jul 29, 2020

I have a first prototype of a library that has functions from pythons os and os.path: https://github.com/MarDiehl/stdlib_os
Pure Fortran where possible, but most file system related operations rely on C routines.
Works on linux with GNU and Intel compilers.
Exceptions in python translate into error stop. Currently without a message, but that can be changed.

@MarDiehl I looked to the code. There are already quite a bunch of procedures for Linux OS. Nicely done. Would it be an idea to submit a PR to discuss further the API?

@certik
Copy link
Member

certik commented Jul 29, 2020

Thanks @MarDiehl for this!

@milancurcic
Copy link
Member

Thank you, I reviewed the repo and recommend moving forward with the PR. We just need to decide how to not build it on Windows.

@certik
Copy link
Member

certik commented Jul 29, 2020

The goal is to have the file system operations working on Windows also eventually, correct?

@jvdp1
Copy link
Member

jvdp1 commented Jul 29, 2020

Thank you, I reviewed the repo and recommend moving forward with the PR. We just need to decide how to not build it on Windows.

Or we should try to find a volunteer fluent with Windows ;)
If the API is already discussed and decided, this should facilite the development right?

@ivan-pi
Copy link
Member

ivan-pi commented Jul 29, 2020

Thanks @MarDiehl for the prototype. Is it possible to identify the common functionality with respect to the API proposal by @aradi in #220? @arjenmarkus left a comment there on how to deal with Windows.

@milancurcic
Copy link
Member

@certik Sorry I had assumed that Martin's implementation depended on POSIX, but on second look I don't see it in the code. If this relies only on C stdlib, I think it should work fine on Windows. But perhaps some OS-specific extensions are used. @MarDiehl did you try it on Windows?

@milancurcic
Copy link
Member

Good point, @ivan-pi. I suggest @MarDiehl and @aradi join forces and present a coherent API. I like Martin's as is, it's clear and familiar to me from Python's API.

@certik
Copy link
Member

certik commented Jul 29, 2020

If it relies on Linux specific things, then we can extend it using ifdefs to also use Windows API to work on Windows.

@MarDiehl
Copy link
Contributor Author

Please find the answers to the questions below:

The goal is to have the file system operations working on Windows also eventually, correct?

Yes, it would be nice if someone contributes this. The actual implementation should not be too difficult, but it needs to be done by a windows native.

Thank you, I reviewed the repo and recommend moving forward with the PR. We just need to decide how to not build it on Windows.

On python, the actual implementation of is called posixpath on posix and ntpath on Windows and will be then mapped to os.path. I would recommend to do the same thing here, but the implementation depends on whether a standard C preprocessor is a prerequisite for compilation stdlib or whether fypp should be used exclusively. Also, the integration into fpm is still a very open topic to me.

Sorry I had assumed that Martin's implementation depended on POSIX, but on second look I don't see it in the code. If this relies only on C stdlib, I think it should work fine on Windows. But perhaps some OS-specific extensions are used. @MarDiehl did you try it on Windows?

I have not tried on Windows but the implementation it is certainly POSIX specific (/ for path separation, ~ for the home directory). Some C headers (e.g. unistd.h) are also not available on Windows. But I have not tried to build it on windows.

If it relies on Linux specific things, then we can extend it using ifdefs to also use Windows API to work on Windows.

I think there is less common code between POSIX and C than it seems on first glance. Most of the C code is probably OS specific and a all the path routines subtly depend on details of path names. Therefore, I opt to have two fully independent implementations.

@MarDiehl
Copy link
Contributor Author

Thanks for all the feedback. Actually, before opening a PR I probably add a few more functions and certainly do some testing.
I also have three direct questions:

  1. Is the following naming convention ok:

    use stdlib_os
    use stdlib_os_path
    
    call chdir('/home') ! function from stdlib_os, no stdlib_os prefix
    print*, islink('/home') ! function from stdlib_os_path, no stdlib_os_path prefix
    
  2. I am using allocatable strings from the Fortran standard.
    As discussed in the last monthly call, I don't see a reason to use specific string implementations (e.g. iso_varying_string). But the decision whether stdlib should have a special string (or even path) type goes beyond the scope of implementing os and os_path. I hope my implementation shows that allocatable strings are all we need. The consequence of this decision is that we can't have object oriented string libraries.
    The only inconvenience of the current implementation is the behavior of split/splitext/splitdrive: They return an size 2 array with length of its strings beeing the maximum length of head and tail. I would attribute this flaw to the lack of a tuple or list type, not to a limitation of allocatable strings.

  3. Could a MacOS user test the code?

@arjenmarkus
Copy link
Member

arjenmarkus commented Jul 30, 2020 via email

@MarDiehl
Copy link
Contributor Author

Hi everyone, I volunteer to contribute the Windows side of things - while I usually avoid Windows-specific stuff, I do use that platform all the time, as well as the ingressions of Linux (or Unix?) on that platform - Cygwin and MinGW. So I should be able to test that it all works on these platforms. I have no access to MacOS unfortunately, so that will have to be someone else. Regards, Arjen

Great! I think in such system-dependent operations, windows specific things are unavoidable and I would not consider them to be a bad thing as long as they are compiler independent. Probably the actual modifications to the code (changing separator, using difference C headers and functions) are not very difficult. Getting a system-dependent build configuration to work is probably the more time consuming task.

We also need to agree on the handling of line endings in the repository. I assume currently it is UNIX style. Would it make sense to enforce UNIX style line endings via git configuration in general? And do you want an exception for windows-only code then?

I gave you access to the repository, please create a new branch for windows changes or simply fork the whole repository.

@MarDiehl
Copy link
Contributor Author

MarDiehl commented Jul 30, 2020

As pointed out by @aradi, the python naming conventions are not very 'fortranig'. I therefore suggest the following names:

os

python        Fortran


chdir         change_directory
getcwd        get_current_working_directory
mkdir         make_directory
rename      
rmdir         remove_directory
symlink       create_symlink?
unlink        remove_File

os.path

python        Fortran

abspath       abs_path
basename      base_name
commonpath    common_path
commonprefix  common_prefix
dirname       dir_name
exists
expanduser    expand_user
expandvars    expand_vars
getatime      get_atime
getctime      get_ctime 
getmtime      get_mtime      
getsize       get_size
isabs         is_abs
isdir         is_dir
isfile        is_file
islink        is_link
ismount       is_mount
join 
normcase      norm_case
normpath      norm_path
samefile      same_file
relpath       rel_path/relative_path?
split
splitdrive    split_drive
splitext      split_ext

@milancurcic
Copy link
Member

@MarDiehl Your suggestions all look good to me. Suggestion for consistency:

  • create_symlink -> create_symbolic_link
  • abs_path -> absolute_path
  • dir_name -> directory_name
  • expand_vars -> expand_variables
  • is_dir -> is_directory

@certik
Copy link
Member

certik commented Jul 30, 2020

I think the two syllables words should be just joined, I think that is actually very fortranic. So abspath over abs_path and basename over base_name. Not everybody agrees with this recommendation, but I know a lot of Fortran programmers do agree, so that is what we recommend here:

https://www.fortran90.org/src/best-practices.html#naming-convention

The underscores should be used if you want to join several syllables or words such as in get_command_argument. But if you can make it just two syllables like getarg, then that works too and I think it looks much better without underscores.

So both of these work: dirname, directory_name. But dir_name I think is suboptimal.

@arjenmarkus
Copy link
Member

arjenmarkus commented Jul 31, 2020 via email

@arjenmarkus
Copy link
Member

arjenmarkus commented Jul 31, 2020 via email

@arjenmarkus
Copy link
Member

arjenmarkus commented Jul 31, 2020 via email

@MarDiehl
Copy link
Contributor Author

Of course it was NOT in f_c_string. It was not even in the rename routine or its brethren. It was in ismount_c() which is called slightly later in the program. The line parent = (char *) malloc(strlen(path)+3); should be: parent = (char *) malloc(strlen(path)+4); With that found, the crash makes sense - strlen() does not count the required NUL character. So the construction of the parent will cause an ever so slightly memory overflow.

good catch. My C knowledge is quite bad, it's almost 15 years since I've learned it at university

@arjenmarkus
Copy link
Member

arjenmarkus commented Jul 31, 2020 via email

@aradi
Copy link
Member

aradi commented Aug 1, 2020

@MarDiehl Looks good to me. I'd propose to use dir instead of directory though (consistently everywhere) as "directory" is a very long word...

@certik I really like your idea as it almost 100% matches the rules on using dashes in composed nouns in my native language (Hungarian) 😄 . However, I think, it would be still confusing for newcomers. Also, it does not match current Fortran naming practice (e.g. type(c_ptr), move_alloc, compiler_version, character_kinds, num_images, etc).

@awvwgk awvwgk added topic: interface Interfacing with other libraries or languages topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ... labels Sep 18, 2021
@Beliavsky
Copy link

I think an impure elemental function to check if one or more files exist, "on the fly", is more convenient than using the inquire statement directly, so I suggest adding a file_exists function:

module m
implicit none
contains
impure elemental function file_exists(xfile) result(exists)
character (len=*), intent(in) :: xfile
logical                       :: exists
inquire(file=xfile,exist=exists)
end function file_exists
end module m

program main
! driver for file_exists
use m, only: file_exists
implicit none
print*,file_exists(["1","2","3"] // ".txt")
end program main

@Romendakil
Copy link

I think an impure elemental function to check if one or more files exist, "on the fly", is more convenient than using the inquire statement directly, so I suggest adding a file_exists function:

module m
implicit none
contains
impure elemental function file_exists(xfile) result(exists)
character (len=*), intent(in) :: xfile
logical                       :: exists
inquire(file=xfile,exist=exists)
end function file_exists
end module m

program main
! driver for file_exists
use m, only: file_exists
implicit none
print*,file_exists(["1","2","3"] // ".txt")
end program main

But this approach only works if all character strings have the same length, otherwise it won't compile. Isn't that usecase too limited?

@MarDiehl
Copy link
Contributor Author

I think exists is sufficient. It can be put in a loop if multiple files are checked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: interface Interfacing with other libraries or languages topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ...
Projects
None yet
Development

No branches or pull requests