Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

逐行读取文件的几种方法对比 #13

Open
kevinyan815 opened this issue Dec 9, 2019 · 0 comments
Open

逐行读取文件的几种方法对比 #13

kevinyan815 opened this issue Dec 9, 2019 · 0 comments

Comments

@kevinyan815
Copy link
Owner

kevinyan815 commented Dec 9, 2019

Use:

  • reader.ReadString('\n')
    
    • If you don't mind that the line could be very long (i.e. use a lot of RAM). It keeps the \n at the end of the string returned.
  • reader.ReadLine()
    
    • If you care about limiting RAM consumption and don't mind the extra work of handling the case where the line is greater than the reader's buffer size.

I tested the various solutions suggested by writing a program to test the scenarios which are identified as problems in other answers:

  • A file with a 4MB line.
  • A file which doesn't end with a line break.

I found that:

  • The Scanner solution does not handle long lines.
  • The ReadLine solution is complex to implement.
  • The ReadString solution is the simplest and works for long lines.

Here is code which demonstrates each solution, it can be run via go run main.go:

package main

import (
    "bufio"
    "bytes"
    "fmt"
    "io"
    "os"
)

func readFileWithReadString(fn string) (err error) {
    fmt.Println("readFileWithReadString")

    file, err := os.Open(fn)
    defer file.Close()

    if err != nil {
        return err
    }

    // Start reading from the file with a reader.
    reader := bufio.NewReader(file)

    var line string
    for {
        line, err = reader.ReadString('\n')

        fmt.Printf(" > Read %d characters\n", len(line))

        // Process the line here.
        fmt.Println(" > > " + limitLength(line, 50))

        if err != nil {
            break
        }
    }

    if err != io.EOF {
        fmt.Printf(" > Failed!: %v\n", err)
    }

    return
}

func readFileWithScanner(fn string) (err error) {
    fmt.Println("readFileWithScanner - this will fail!")

    // Don't use this, it doesn't work with long lines...

    file, err := os.Open(fn)
    if err != nil {
        return err
    }
    defer file.Close()

    // Start reading from the file using a scanner.
    scanner := bufio.NewScanner(file)

    for scanner.Scan() {
        line := scanner.Text()

        fmt.Printf(" > Read %d characters\n", len(line))

        // Process the line here.
        fmt.Println(" > > " + limitLength(line, 50))
    }

    if scanner.Err() != nil {
        fmt.Printf(" > Failed!: %v\n", scanner.Err())
    }

    return
}

func readFileWithReadLine(fn string) (err error) {
    fmt.Println("readFileWithReadLine")

    file, err := os.Open(fn)
    defer file.Close()

    if err != nil {
        return err
    }

    // Start reading from the file with a reader.
    reader := bufio.NewReader(file)

    for {
        var buffer bytes.Buffer

        var l []byte
        var isPrefix bool
        for {
            l, isPrefix, err = reader.ReadLine()
            buffer.Write(l)

            // If we've reached the end of the line, stop reading.
            if !isPrefix {
                break
            }

            // If we're just at the EOF, break
            if err != nil {
                break
            }
        }

        if err == io.EOF {
            break
        }

        line := buffer.String()

        fmt.Printf(" > Read %d characters\n", len(line))

        // Process the line here.
        fmt.Println(" > > " + limitLength(line, 50))
    }

    if err != io.EOF {
        fmt.Printf(" > Failed!: %v\n", err)
    }

    return
}

func main() {
    testLongLines()
    testLinesThatDoNotFinishWithALinebreak()
}

func testLongLines() {
    fmt.Println("Long lines")
    fmt.Println()

    createFileWithLongLine("longline.txt")
    readFileWithReadString("longline.txt")
    fmt.Println()
    readFileWithScanner("longline.txt")
    fmt.Println()
    readFileWithReadLine("longline.txt")
    fmt.Println()
}

func testLinesThatDoNotFinishWithALinebreak() {
    fmt.Println("No linebreak")
    fmt.Println()

    createFileThatDoesNotEndWithALineBreak("nolinebreak.txt")
    readFileWithReadString("nolinebreak.txt")
    fmt.Println()
    readFileWithScanner("nolinebreak.txt")
    fmt.Println()
    readFileWithReadLine("nolinebreak.txt")
    fmt.Println()
}

func createFileThatDoesNotEndWithALineBreak(fn string) (err error) {
    file, err := os.Create(fn)
    defer file.Close()

    if err != nil {
        return err
    }

    w := bufio.NewWriter(file)
    w.WriteString("Does not end with linebreak.")
    w.Flush()

    return
}

func createFileWithLongLine(fn string) (err error) {
    file, err := os.Create(fn)
    defer file.Close()

    if err != nil {
        return err
    }

    w := bufio.NewWriter(file)

    fs := 1024 * 1024 * 4 // 4MB

    // Create a 4MB long line consisting of the letter a.
    for i := 0; i < fs; i++ {
        w.WriteRune('a')
    }

    // Terminate the line with a break.
    w.WriteRune('\n')

    // Put in a second line, which doesn't have a linebreak.
    w.WriteString("Second line.")

    w.Flush()

    return
}

func limitLength(s string, length int) string {
    if len(s) < length {
        return s
    }

    return s[:length]
}

I tested on:

  • go version go1.7 windows/amd64
  • go version go1.6.3 linux/amd64
  • go version go1.7.4 darwin/amd64

The test program outputs:

Long lines

readFileWithReadString
 > Read 4194305 characters
 > > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 > Read 12 characters
 > > Second line.

readFileWithScanner - this will fail!
 > Failed!: bufio.Scanner: token too long

readFileWithReadLine
 > Read 4194304 characters
 > > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 > Read 12 characters
 > > Second line.

No linebreak

readFileWithReadString
 > Read 28 characters
 > > Does not end with linebreak.

readFileWithScanner - this will fail!
 > Read 28 characters
 > > Does not end with linebreak.

readFileWithReadLine
 > Read 28 characters
 > > Does not end with linebreak.

Source Link: https://stackoverflow.com/questions/8757389/reading-file-line-by-line-in-go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant