-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for high level I/O #14
Comments
@cmacmackin Can you explain why the number and type of arguments aren't known at compile time? How many can there possibly be? I think you'd need just clever formatting of numeric types to characters, but I may be missing the point. |
I think he means that with standard |
I see, that didn't even cross my mind. Wouldn't for most purposes one argument (character string for text, numeric or strings for binary) suffice? The only time I put multiple variables on If we had nice on-the-fly formatting like Python does, then this becomes trivial I think. |
@milancurcic can you elaborate a bit why the unit numbers are insufficient? The recommended approach with To read from a file: integer :: u
open(newunit=u, file="log.txt", status="old")
read(u, *) a, b
close(u) Write to a file: integer :: u
open(newunit=u, file="log.txt", status="replace")
write(u, *) a, b
close(u) To append to an existing file: integer :: u
open(newunit=u, file="log.txt", position="append", status="old")
write(u, *) N, V(N)
close(u) The only annoying thing to me is that you have to put |
Yes, unit numbers work. They were tedious before With a high-level interface, your examples would be something like this. Read from a file: type(file_type) :: f
f = file_type('log.txt', 'r')
a = f % read()
f % close() Write to a file: type(file_type) :: f
f = file_type('log.txt', 'w')
call f % write(a)
f % close() Append to an existing file: type(file_type) :: f
f = file_type('log.txt', 'r+')
call f % write(a)
f % close() The pros as I see them:
Cons:
|
I talked with @zjibben about this and he pointed out another advantage of your approach: automatic closing of files when the instance goes out of scope. Also I would suggest that type(file_type) :: f
f = file_type('log.txt')
a = f % read() Also perhaps the type could be called One issue is that you need to be able to read different types and I don't know if you can override |
Another issue: how do you distinguish between: read(u,*) a, b and read(u,*) a
read(u,*) b The first reads two numbers one a line, the second reads a number from two lines --- or are the two equivalent? |
I like just In the case of reading and writing formatted data, the trivial implementation doesn't operate on multiple variables. It reads or writes a string, for example: character(len=:), allocatable :: a
type(File) :: f
f = File('log.txt')
f % read(a) ! reads whole file into string a
f % seek(0) ! rewind
f % readline(a) ! reads a single line into string a To write: f = File('log.txt', 'w')
f % write('pi = %.3f' .fmt. 3.141) similar to Python. The last line would write 'pi = 3.141' to log.txt. This is an example of on-the-fly formatting that I mentioned earlier, which I think is needed if you want to elegantly read and write multiple variables. C-style formatting is something I'd very much like in Fortran. I think it may help some newcomers to the language as well because this kind of formatting is more common in other languages. But that's for another proposal. :) |
Yes, that's what I was trying to communicate, albeit not very well. |
This actually wouldn't be too difficult to implement in a standard library. We'd just write a series of wrappers in C, taking different numbers of |
I like the idea. As mentioned by @cmacmackin this would be easier to implement if Fortran had support for variadic functions (in fact I mentioned exactly this use case in my comment at j3-fortran/fortran_proposals#76 (comment)). The idea of wrapping a bunch of C subroutines in a generic block seems like a good idea. I am only worried about a combinatorial explosion. Say I want to read 3 integers, 4 reals, and a character array, etc., from a single line, wouldn't we then need to generate all possible combinations of all intrinsic types? Or would you somehow convert everything internally to a C pointer and use a format specifier? (Hopefully, my idea makes sense.) type(File) :: f
integer :: a, b, c
real :: d, e, ff, g, h
character(len=1) :: char_arr(5)
call f%open("data.txt","r")
call f%readline(a, b, c, d, e, ff, g, h, char_arr, fmt=" ... some kind of format specifier ...") Or we could just read lines as an allocatable character string and do the type conversion ourselves: read(f%readline(),*) a, b, c, d, e, ff, g, h, char_arr ! <-- assuming this works? I suppose this would work: character(len=:), allocatable :: line
! ... open file ...
call f%readline(line)
read(line,*) a, b, c, d, e, ff, g, h, char_arr |
I would suggest to keep high-level I/O for ASCII files with formatted data (e.g., table of reals; something similar to If the user wants something more complex (mix of integers/reals/characters), or binary/stream files, the user can still use newunit (or the readline op proposed by @ivan-pi ; but probably less efficient). |
So more like np.loadtxt(), but with the possibility to load the individual columns or rows directly into arrays (1d or 2d)? |
Yes, it was my idea. I observed that I/O is usually quite a difficult step for most beginners when they come from R/Python/Matblab. Such high-level I/O could maybe help them. |
I would also like to see a more POSIX-like interface to the filesystem. I would also like to see this as independent from the Fortran interface, with a focus on data streams rather than the list-directed I/O for which Fortran I/O has been tailored. I also think that there should be a generic I like the way Python does this, with a top level |
Great ideas for C-style formatting implementation. A basic implementation for @marshallward I'd like a maintained and tested POSIX interface as well. Do you mind opening a proposal for this? (Update: done in #22.) @cmacmackin We can have a dedicated proposal to just C-style formatting. I like your idea for implementation. Do you mind opening a proposal for this? (Update: done in #19.) |
@ivan-pi My suggestion of calls to C was specifically for a void printf_wrapper1(const char *format, void *arg1) {
printf(format, arg1, arg2)
}
void printf_wrapper2(const char *format, void *arg1, void *arg2) {
printf(format, arg1, arg2)
} interface printf
subroutine printf_wrapper1(format_str, arg1) bind(c)
character(len=1), dimension(*), intent(in) :: format_str
type(*), intent(in) :: arg1
end subroutine printf_wrapper1
subroutine printf_wrapper2(format_str, arg1, arg2) bind(c)
character(len=1), dimension(*), intent(in) :: format_str
type(*), intent(in) :: arg1, arg2
end subroutine printf_wrapper2
end interface printf The main complication with this is how to convert between Fortran and C strings. It wouldn't be hard to provide wrapper routines which do this for the Format string, but string arguments to |
Three things I've implemented that I find useful for IO particularly input data where I create my own keyword based input. One is a File class that basically wraps current Fortran IO (open, read, write etc) but hides the unit number. I have read and write wrappers that allow you to read/write formatted, list-directed or binary of the intrinsic data types from the same subroutine. Another useful class is a deferredLenString class that is a defined type that contains a deferred length string with some routines to convert back and forth to regular strings and do the equivalent of LEN, LEN_TRIM etc. Finally, I define a fileImage class that uses an array of the deferredLenString class to hold all the records in a formatted input file in memory as a "file image". I have filters that can change case, strip delimiters etc from the original file. When processing keyword based input files, using a file image instead of the disk file is a lot faster. Some or all of these might prove useful for implementing IO classes specifically for processing text files etc. |
I think we do not need a derived type wrapper for Fortran i/o. The current i/o covers many of the general needs. Some missing capabilities can be (and gradualy are) improved from the language side and introducing a wrapper would not help with anything. For a particular file format (such as PPM), anyone can write their own wrapper using derived types. This is literally what derived types are for. They should not be used for introducing another way of doing the same thing (which is universally bad). So I am strongly against this proposal. |
I generally agree with @gronki on this --- it's similar in #69. Rather than creating our own However, as a pragmatic approach, I think the best it to have both: a low level API that uses the language features. People like @gronki and I will use that primarily. And optionally a high level API, that lots of people can also use. I think we can have both. What we should not have is just the high level API without the low level one. |
Yes, perhaps it's better for this to start as a separate project, and after having something concrete to look at we can discuss whether people would like to have it in stdlib or not. |
@milancurcic let's separate the genuine new functionality, which would go into the low level API, from a convenience wrapper, which would go into the high level OO API. Here are some ideas off top of my head that would go into the low level API:
The high level API would then wrap these in a convenience derived type wrapper I see a repeated pattern across many of the issues in stdlib ---- the high level "simple" OO interface is motivated by the fact that the low level built-in Fortran language features are insufficient. In order to fix the low level API, which a lot of people would otherwise use, let's use the OO interface as a vehicle to figure out what features we want and need (not being constrained by the limitations of the low level API: we can design our derived type and methods in any way we like). Then extract the genuine new functionality, and put it into the low level API, and the high level API is just a thin wrapper. Then let's create proposals to get some of the stdlib's improvements of the low level API into the Fortran standard itself -- backed by wide support here and our future users using it. And let people choose between the low level API and the high level API. It looks like we will have users for both. |
I've always found it really strange that Fortran's read and write actually muddled the jobs of parsing and formatting together with the I/O. It leads to input files/specifications that are predicated on that functionality, which in my opinion is not a good thing. You're users shouldn't have to care about the peculiarities of the language you're using. I once was porting some Fortran code to Python and had to write something to simulate Fortran's IO in order to make sure old input files would still work properly. I'm pretty fond of Python's file API. I think we should mimic that as a library and see if it works well for everybody. We need a string type to implement it properly though since |
@certik I am not sure if I undestood correctly the last paragraph. Do you mean that the wrapper should be developed as the protype/testbench for the features that will be handled intrinsically in the future? @everythingfunctional I disagree there is any reason to mimic Python's io library in fortran I/O. Is there anything that can be done in Python that Fortran does not enable to do with its current functionality? I would even argue that Fortran has superior functionality in some aspects and that the lesser brevity in other situations comes from lacks in string manipulation capabilities (for example, having to use write/read to parse strings to numbers). Again, sorry for being a bitter potato. I am all for changes and criticizing Fortran where it sucks but I also want to make sure that we do not re-invent what's already good just because other languages handle it differently. Please do not take my opinions personally. :) |
@gronki yes, that's what I meant.
Here is one example: #71. Look how much simpler the new |
Yes, ease of use is the key motivation for me here, and IMO is one of highest priorities for stdlib. I also learn standard Fortran I/O every time I use it. Teaching it is especially tedious. |
I understand your point. But my counter point is: then why not just use Python. :) What I want to say is that "just because it looks like Python" is not an argument because Fortran is not Python. In this particular issue, the "Python-like" (or actually, C-like, because that's its origin) syntax takes the same amount of information (unit number, filename, access mode), just arranges it differently. It does not change the quality of life in any way. Nor would derived type wrapper of current I/O. I am all for quality improvements that actually do make I/O more functional. But I disagree any of the ones discussed so far do that. That's only my opinion though and please don't be discouraged. As it was mentioned by someone before, since this is stdlib project (and not standard proposal project), it's probably best to let people make their packages and see if the solution produced ends up actually being better. :) |
It's about making Fortran easier to use. This: s = open(filename)
s = open(filename, "w") is simpler and more consistent than this: open(newunit=s, file=filename, status="old", action="read")
open(newunit=s, file=filename, status="replace", action="write") That's all there is to it. Just like Fortran has arrays, which do have similar information as the various C++ array libraries, but they are much easier to use. That's the whole point. So the goal of |
No, Python doesn't have nearly the number of options as Fortran, but that's the point. Fortran has too many options, and that makes it difficult to use. It would be nice if there was a simpler, more user friendly way to deal with files so you wouldn't have to look it up every time. |
After some discussions and implementations (see #71, #77, #86, #91), the function integer function open(filename, mode, iostat) result(u)
character(*), intent(in) :: filename
character(*), intent(in), optional :: mode
integer, intent(out), optional :: iostat
...
end function This function returns a
Currently, both text and binary files are opened with So, should this API support the other accesses ( Supporting A draft PR (#91) has been opened to propose an API to support at least integer function open(filename, mode, iostat, access) result(u)
character(*), intent(in) :: filename
character(*), intent(in), optional :: mode
character(*), intent(in), optional :: access
integer, intent(out), optional :: iostat
end function |
To start: what is the advantage of unformatted sequential, compared to unformatted stream? There is a disadvantage that the format is compiler version specific. But is there any advantage? Could it be perhaps faster, or is that not the case. |
In this comment, @jacobwilliams mentioned:
Would it be an interest to implement a subroutine/function in io = delete('file.txt') or call delete('file.txt',iostat=io) with iostat being optional Such a function/subroutine would avoid the 'traditional': open(newunit=u, file = 'file.txt', status = 'old')
close(u, status = 'delete') |
I think the answer is yes, only we should make the naming consistent. Python has remove, and we should survey other languages. |
I am fine with calling it |
Me too.
It would make sense to me, since If we want to implement more stuff around file systems #100, we should probably have a new module dedicated to file system stuff (e.g., rm, rmdir, ls, mkdir,...) |
Addition of test_maps using test-drive
Branched from #1 (comment)
One personal challenge I have with stock Fortran are its somewhat awkward and low-level I/O facilities -- open, read, write, inquire, rewind, and close. I often wished for a higher-level interface, like what you get with Python's open() -- you open a file with a function, get a file-like instance with methods that let you do stuff with it.
This would do away with unit numbers, which I don't think application developers should have to deal with. It could also be a solution to the problem that allocatable character strings must be pre-allocated before use on read statement.
Is there anything similar out there for Fortran? Would this be of interest to people here? I'd use it.
@jvdp1 wrote: I would use it too.
@cmacmackin wrote: I'd personally like something along these lines. However, the problem is in defining methods on the file-object; these would need to know the number and type of arguments at compile-time. It would be impractical to produce methods with every conceivable permutation of object types. It would also require variadic functions, which are not available. As such, this can not be implemented well in Fortran, although perhaps something would be possible if we were to wrap some C-routines and pass in deferred-type objects.
The text was updated successfully, but these errors were encountered: