Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add getline to read whole line from formatted unit #597

Merged
merged 5 commits into from
Dec 19, 2021

Conversation

awvwgk
Copy link
Member

@awvwgk awvwgk commented Dec 12, 2021

Alternative implementation by John @urbanjost in M_io
!!##NAME
!!    getline(3f) - [M_io] read a line from specified LUN into allocatable string up to line length limit
!!    (LICENSE:PD)
!!
!!##SYNTAX
!!   function getline(line,lun) result(ier)
!!
!!    character(len=:),allocatable,intent(out) :: line
!!    integer,intent(in),optional              :: lun
!!    integer,intent(out)                      :: ier
!!
!!##DESCRIPTION
!!    Read a line of any length up to programming environment maximum
!!    line length. Requires Fortran 2003+.
!!
!!    It is primarily expected to be used when reading input which will
!!    then be parsed.
!!
!!    The input file must have a PAD attribute of YES for the function
!!    to work properly, which is typically true.
!!
!!    The simple use of a loop that repeatedly re-allocates a character
!!    variable in addition to reading the input file one buffer at a
!!    time could (depending on the programming environment used) be
!!    inefficient, as it could reallocate and allocate memory used for
!!    the output string with each buffer read.
!!
!!##OPTIONS
!!    LINE   line read
!!    LUN    optional LUN (Fortran logical I/O unit) number. Defaults
!!           to stdin.
!!##RETURNS
!!    IER    zero unless an error occurred. If not zero, LINE returns the
!!           I/O error message.
!!
!!##EXAMPLE
!!
!!   Sample program:
!!
!!    program demo_getline
!!    use,intrinsic :: iso_fortran_env, only : stdin=>input_unit
!!    use M_io, only : getline
!!    implicit none
!!    character(len=:),allocatable :: line
!!       open(unit=stdin,pad='yes')
!!       INFINITE: do while (getline(line)==0)
!!          write(*,'(a)')'['//line//']'
!!       enddo INFINITE
!!    end program demo_getline
!!
!!##AUTHOR
!!    John S. Urban
!!
!!##LICENSE
!!    Public Domain
function getline(line,lun) result(ier)
implicit none

! ident_11="@(#)M_io::getline(3f): read a line from specified LUN into allocatable string up to line length limit"

character(len=:),allocatable,intent(out) :: line
integer,intent(in),optional              :: lun
integer                                  :: ier
character(len=4096)                      :: message

integer,parameter                        :: buflen=1024
character(len=:),allocatable             :: line_local
character(len=buflen)                    :: buffer
integer                                  :: isize
integer                                  :: lun_local

   line_local=''
   ier=0
   if(present(lun))then
      lun_local=lun
   else
      lun_local=stdin
   endif
   open(lun_local,pad='yes')

   INFINITE: do                                                      ! read characters from line and append to result
      read(lun_local,iostat=ier,fmt='(a)',advance='no',size=isize,iomsg=message) buffer ! read next buffer (might use stream I/O for files
                                                                     ! other than stdin so system line limit is not limiting
      if(isize.gt.0)line_local=line_local//buffer(:isize)            ! append what was read to result
      if(is_iostat_eor(ier))then                                     ! if hit EOR reading is complete unless backslash ends the line
         ier=0                                                       ! hitting end of record is not an error for this routine
         exit INFINITE                                               ! end of reading line
     elseif(ier.ne.0)then                                            ! end of file or error
        line=trim(message)
        exit INFINITE
     endif
   enddo INFINITE
   line=line_local                                                   ! trim line
end function getline
Closes #595

@awvwgk awvwgk added reviewers needed This patch requires extra eyes topic: IO Common input/output related features labels Dec 12, 2021
Copy link
Member

@milancurcic milancurcic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks! I left a few comments and questions.

src/stdlib_io.fypp Outdated Show resolved Hide resolved
src/stdlib_io.fypp Outdated Show resolved Hide resolved
doc/specs/stdlib_io.md Show resolved Hide resolved
@milancurcic milancurcic requested a review from ivan-pi December 17, 2021 17:58
@milancurcic
Copy link
Member

@urbanjost Does this look good to merge to you?

@urbanjost
Copy link

urbanjost commented Dec 17, 2021

If IOSTAT is optional on the call and the user leaves it off, how do you tell you hit end-of-file?

The only other thing I see is a matter of taste. If LINE comes before UNIT then UNIT could be optional also and default to stdin. I think that is preferable to encouraging users to put in a 5; although if they put in INPUT_UNIT it is not bad, but requires a USE statement and using a long variable name.

There really should be a note somewhere the input file requires PAD mode to be on or it will confuse someone sooner or later.

Could have sworn something similar was in stdlib allready but apparently not; I find a procedure like this very handy, especially for prototyping.

I have a version that ignored trailing ctrl_J and ctrl_M characters because of a problem with MSWindows and Linux, especially when reading files from the other platform but it looks like that has settled down, as I tried it with three compilers and did not see anything wrong at least with formatted sequential files. I think it would still be a problem with stream I/O but this would seem to (almost?) always be used with formatted sequential so I do not consider that a problem.

@awvwgk
Copy link
Member Author

awvwgk commented Dec 18, 2021

If IOSTAT is optional on the call and the user leaves it off, how do you tell you hit end-of-file?

This should work in analogy with a usual read going past an end of file without having iostat specified:

At line 46 of file /home/awvwgk/projects/src/git/stdlib/src/tests/io/test_getline.f90
Fortran runtime error: End of file

The only other thing I see is a matter of taste. If LINE comes before UNIT then UNIT could be optional also and default to stdin. I think that is preferable to encouraging users to put in a 5; although if they put in INPUT_UNIT it is not bad, but requires a USE statement and using a long variable name.

I added another version of getline which doesn't require to pass the unit as first argument in analogy to read(unit, *)/write(unit, *) and read */print *.

There really should be a note somewhere the input file requires PAD mode to be on or it will confuse someone sooner or later.

I'm checking the pad case now. Note that open(unit=unit, pad="yes") for an unopened unit will open it, therefore I'm now checking first whether the unit is opened using inquire(unit=unit, opened=opened). However, -1 must not be passed to inquire, so another thing to catch.

Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
My only concern is the lenght of the buffer. 512 characters would be often too small for some of my cases.

@awvwgk
Copy link
Member Author

awvwgk commented Dec 18, 2021

I created a test with different bufferlengths for reading 3600 lines, with the longest lines having 120000 chars, short lines having 1200 chars. Timings in seconds.

buffersize long lines mixed lines short lines
512 44.8 15.1 8.3e-2
1024 24.0 7.6 7.9e-2
2048 12.5 4.7 7.3e-2
4096 8.9 2.6 8.2e-2
8192 4.8 1.9 7.9e-2

Larger buffersizes indeed give a sizable speedup, as the time consuming step is the read of each chunk from the unit. Also, it is good to see that short lines are read fast regardless of the buffersize.

@awvwgk awvwgk removed the reviewers needed This patch requires extra eyes label Dec 18, 2021
@awvwgk awvwgk merged commit bae6be5 into fortran-lang:master Dec 19, 2021
@awvwgk awvwgk deleted the getline branch December 19, 2021 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: IO Common input/output related features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Procedure to read whole line to deferred length character
4 participants