Skip to content

Commit

Permalink
add w6_coming_in_go_1.16_readdir_and_direntry.md chinese version
Browse files Browse the repository at this point in the history
  • Loading branch information
王磊 authored and cvley committed Feb 8, 2021
1 parent 8affce9 commit bcf4c5f
Showing 1 changed file with 35 additions and 38 deletions.
73 changes: 35 additions & 38 deletions 2021/w6_coming_in_go_1.16_readdir_and_direntry.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,36 @@
# Coming in Go 1.16: ReadDir and DirEntry
# Go 1.16 即将到来的函数:ReadDir DirEntry

- 原文地址:https://benhoyt.com/writings/go-readdir/
- 原文作者:Ben Hoyt
- 本文永久链接:https://github.com/gocn/translator/blob/master/2021/w6_coming_in_go_1.16_readdir_and_direntry.md
- 译者:[cvley](https://github.com/cvley)
- 校对:[](https://github.com/)
- 校对:[guzzsek](https://github.com/guzzsek)

January 2021
2021年1月

As the primary author of Python’s [`os.scandir`](https://docs.python.org/3/library/os.html#os.scandir) function and [PEP 471](https://www.python.org/dev/peps/pep-0471/) (the original proposal for `scandir`), I was very happy to see that Go is adding something similar in Go 1.16, which is coming out in late February 2021.
作为Python中的 [`os.scandir`](https://docs.python.org/3/library/os.html#os.scandir) [PEP 471](https://www.python.org/dev/peps/pep-0471/) `scandir`的首次提案)的主要作者,我很开心看到将在2021年2月下旬发布的Go 1.16版本中将增加类似的函数。

In Go it will be called [`os.ReadDir`](https://tip.golang.org/pkg/os/#ReadDir), and was [proposed](https://github.com/golang/go/issues/41467) last September. After more than 100 comments and several tweaks to the design, it was [committed](https://github.com/golang/go/commit/a4ede9f9a6254360d39d0f45aec133c355ac6b2a) by Russ Cox in October. A file system-agnostic version is also included in the new [`io/fs`](https://tip.golang.org/pkg/io/fs/) package as [`fs.ReadDir`](https://tip.golang.org/pkg/io/fs/#ReadDir).
在Go中,这个函数叫做 [`os.ReadDir`](https://tip.golang.org/pkg/os/#ReadDir),是在去年九月提出的[提案](https://github.com/golang/go/issues/41467) 。在100多个评论和对设计进行多次细微调整后,Russ Cox 在10月[提交了对应的代码](https://github.com/golang/go/commit/a4ede9f9a6254360d39d0f45aec133c355ac6b2a)。 这次提交也包含了一个不感知文件系统的版本,是位于新的[`io/fs`](https://tip.golang.org/pkg/io/fs/) 包中 [`fs.ReadDir`](https://tip.golang.org/pkg/io/fs/#ReadDir)的函数。

## Why is ReadDir needed?
## 为什么需要ReadDir?

The short answer is: performance.
简短的答案是:性能。

When you call the system functions to read directory entries, the OS typically returns the file name _and_ its type (and on Windows, stat information such as file size and last modified time). However, the original Go and Python interfaces threw away this extra information, requiring you to make an additional `stat` call per entry. System calls [aren’t cheap](https://stackoverflow.com/a/6424772/68707) to begin with, and `stat` may read from disk, or at least the disk cache.
当调用读取文件夹路径的系统函数时,操作系统一般会返回文件名_和_它的类型(在Windows下,还包括如文件大小和最后修改时间等的stat信息)。然而,原始版本的Go和Python接口会丢掉这些额外信息,这就需要在读取每个路径时再多调用一个`stat`。系统调用的[性能较差](https://stackoverflow.com/a/6424772/68707) `stat` 可能从磁盘、或至少从磁盘缓存读取信息。

When recursively walking a directory tree, you need to know whether an entry is a file or directory so you know whether to recurse in. So even a simple directory tree traversal required reading the directory entries _and_ `stat`\-ing each entry. But if you use the file type information the OS provides, you can avoid those `stat` calls and traverse a directory several times as fast (even dozens of times as fast on network file systems). See some [benchmarks](https://github.com/benhoyt/scandir#benchmarks) for the Python version.
在循环遍历目录树时,你需要知道一个路径是文件还是文件夹,这样才可以知道循环遍历的方式。因此即使一个简单的目录树遍历,也需要读取文件夹路径并获取每个路径的`stat`信息。但如果使用操作系统提供的文件类型信息,就可以避免那些`stat`系统调用,同时遍历目录的速度也将提高几倍(在网络文件系统上甚至可以快十几倍)。具体信息可以参考Python版本的[基准测试](https://github.com/benhoyt/scandir#benchmarks)

Both languages, unfortunately, started with a non-optimal design for reading directories that didn’t allow you to access the type information without extra calls to `stat`: [`os.listdir`](https://docs.python.org/3/library/os.html#os.listdir) in Python, and [`ioutil.ReadDir`](https://golang.org/pkg/io/ioutil/#ReadDir) in Go.
不幸的是,两种语言中读取文件夹的最初实现都不是最优的设计,不使用额外的系统调用`stat`就无法获取类型信息:Python中的[`os.listdir`](https://docs.python.org/3/library/os.html#os.listdir)和Go中的 [`ioutil.ReadDir`](https://golang.org/pkg/io/ioutil/#ReadDir)

I first came up with the idea behind Python’s `scandir` in 2012, and implemented it for Python 3.5, which came out in 2015 ([read more about that process](https://benhoyt.com/writings/scandir/)). It’s been improved and added to since: for example, `with` statement handling and file descriptor support.
我在2012年首次想到Python的`scandir`背后的原理,并为2015年发布的Python 3.5实现了这个函数([从这里可以了解更多这个过程的信息](https://benhoyt.com/writings/scandir/))。此后这个函数不断地被改进完善:比如,增加`with`控制语句和文件描述符的支持。

For Go, I didn’t have anything to do with the proposal or implementation, apart from a [couple](https://github.com/golang/go/issues/41467#issuecomment-694603286) [of](https://github.com/golang/go/issues/41467#issuecomment-697937162) [comments](https://github.com/golang/go/issues/41188#issuecomment-686720957) suggesting improvements based on my experience with the Python version.
对于Go语言,除了基于Python版本的经验提出[一些](https://github.com/golang/go/issues/41467#issuecomment-694603286)改进[建议](https://github.com/golang/go/issues/41467#issuecomment-697937162)[评论](https://github.com/golang/go/issues/41188#issuecomment-686720957)外,我没有参与这个提案或实现。

## Python vs Go

Let’s have a look at the new “read directory” interfaces, particularly how similar they are in Python and Go.
我们看下新的“读取文件夹”的接口,尤其关注下它们在Python和Go中有多么的相似。

In Python you call `os.scandir(path)`, and it returns an iterator of `os.DirEntry` objects, which are as follows:
在Python中调用`os.scandir(path)`,会返回一个`os.DirEntry`的迭代器,如下所示:

```
class DirEntry:
Expand All @@ -55,11 +55,10 @@ class DirEntry:
# Return stat information for this entry.
def stat(self, follow_symlinks=True) -> stat_result: ...
```


Accessing the `name` and `path` attributes will never raise exceptions, but the method calls may raise `OSError`, depending on operating system and file system, and whether the entry is a symbolic link or not. For example, on Linux, `stat` always performs a system call, and hence may raise an exception, but the `is_X` methods usually do not.
访问`name``path`属性将不会抛出异常,但根据操作系统和文件系统,以及路径是否为符号链接,方法的调用可能会抛出`OSError`异常。比如,在Linux下,`stat`总是会进行一次系统调用,因此可能会抛出异常,但`is_X`的方法一般不会这样。

In Go you call `os.ReadDir(path)`, and it returns a slice of `os.DirEntry` objects, which look like this:
在Go语言中,调用`os.ReadDir(path)`,将会返回一个`os.DirEntry`对象的切片,如下所示:

```
type DirEntry interface {
Expand All @@ -76,11 +75,10 @@ type DirEntry interface {
Info() (FileInfo, error)
}
```


You can see the similarities right away, though in true Go fashion, the Go version is somewhat simpler. In fact, if I were doing Python’s `scandir` again, I’d probably push for a slightly simpler interface – in particular, getting rid of the `follow_symlinks` parameter and making it not follow symbolic links by default.
尽管在真正的Go风格下,Go版本更加简单,但你一眼就可以看出二者之间多么相似。实际上,如果重新来写Python的`scandir`,我很可能会选择一个更简单的接口——尤其是要去掉`follow_symlinks`参数,不让它默认跟随处理符号链接。

Here’s an example that uses `os.scandir` – a function that calculates the total size of the files in a directory and its subdirectories, recursively:
下面是一个使用`os.scandir`的例子——一个循环计算文件夹及其子文件夹中文件的总大小的函数:

```
def get_tree_size(path):
Expand All @@ -94,7 +92,7 @@ def get_tree_size(path):
return total
```

In Go (once 1.16 comes out) it would look like this:
在Go中(一旦1.16发布),对应的函数如下所示:

```
func GetTreeSize(path string) (int64, error) {
Expand Down Expand Up @@ -122,15 +120,15 @@ func GetTreeSize(path string) (int64, error) {
}
```

A similar high-level structure, though of course someone’s going to say, “see, look how much boilerplate Go’s error handling introduces!” And that’s true – the Python code is very neat. In a little script that would be fine, and that’s where Python excels.
高级结构很相似,当然有人可能会说:“看,Go的错误处理多么繁琐!”没错——Python代码非常简洁。在简短脚本的情况下这没有问题,而这也是Python的优势。

However, in production code, or in a hardened command-line utility, you’d want to catch errors around the stat call, and perhaps ignore permission errors, or log them. The Go code makes explicit the fact that errors can occur, and would easily allow you to add logging or nicer error messages.
然而,在生产环境的代码中,或者在一个频繁使用的命令行工具库中,捕获stat调用的错误会更好,进而可以忽略权限错误或者记录日志。Go代码可以明确看到错误发生的情况,可以让你轻松添加日志或者打印的错误信息更好。

## Higher-level tree walking
## 更高级的目录树遍历

In addition, both languages have higher-level functions for recursively walking a directory tree. In Python, that’s [`os.walk`](https://docs.python.org/3/library/os.html#os.walk). The beauty of `scandir` in Python is that the signature of `os.walk` didn’t need to change, so all existing users of `os.walk` (of which there are many) got the speed-up automatically.
另外,两个语言都有更高级的循环遍历目录的函数。在Python中,它是[`os.walk`](https://docs.python.org/3/library/os.html#os.walk)。Python中`scandir`的美妙之处在于`os.walk`的签名无需改变,因此所有`os.walk`的用户(有非常多)都可以自动得到加速。

For example, to print all the non-dot file paths in a directory tree using `os.walk`:
比如,使用`os.walk`打印文件夹下所有非点的路径:

```
def list_non_dot(path):
Expand All @@ -144,17 +142,16 @@ def list_non_dot(path):
paths.append(os.path.join(root, f))
return sorted(paths)
```


As of Python 3.5, where `os.walk` uses `scandir` instead of `listdir` under the hood, this will magically be from 1.5 to 20 times as fast, depending on operating system and file system.
从Python3.5开始,`os.walk`底层使用`scandir`代替`listdir`,根据操作系统和文件系统,这可以显著提升1.5到20倍的速度。

Go (pre-1.16) has a similar function, [`filepath.Walk`](https://golang.org/pkg/path/filepath/#Walk), but unfortunately the [`FileInfo`](https://golang.org/pkg/os/#FileInfo) interface wasn’t designed to allow errors to be reported from its various method calls. As we’ve seen, these can sometimes perform system calls – for example, the stat information like `Size` will always require a system call on Linux. So in Go, the methods need to return an error (in Python they raise an exception).
Go pre-1.16版本)语言中有一个相似的函数, [`filepath.Walk`](https://golang.org/pkg/path/filepath/#Walk),但不幸的是 [`FileInfo`](https://golang.org/pkg/os/#FileInfo) 接口的设计无法支持各种方法调用时的错误报告。正如我们所知,有时函数会进行系统调用——比如,像`Size`这样的统计信息在Linux下总是需要一次系统调用。因此在Go语言中,这些方法需要返回错误(在Python中它们会抛出异常)。

Is was tempting to wave error handling away to try to reuse the `FileInfo` interface, so that existing code would get a magical speed-up. In fact, [issue 41188](https://github.com/golang/go/issues/41188) is a proposal from Russ Cox suggesting just that (with some [data](https://github.com/golang/go/issues/41188#issuecomment-690879673) to show that it’s not as terrible an idea as it sounds). However, `stat` can and does return errors, so there was potential for things like a file size being returned as 0 on error. As a result, there was significant push-back against trying to wedge it into the existing API, and Russ eventually [acknowledged](https://github.com/golang/go/issues/41188#issuecomment-694596908) the lack of consensus and proposed the `DirEntry` interface instead.
是否要尝试去掉错误处理的逻辑来重复使用 `FileInfo` 接口,这样现有代码就可以显著提速。实际上,Russ Cox提出一个提案 [issue 41188](https://github.com/golang/go/issues/41188)就是这个思路(提供了一些数据来表明这个想法并不像听起来那么不靠谱)。然而,`stat` 确实会返回错误,因此像文件大小这样潜在的属性应该在错误时返回0。这样对应的结果是,要把这个逻辑嵌入到现有的API中,需要大量需要推动改动的地方,最后Russ[确认](https://github.com/golang/go/issues/41188#issuecomment-694596908) 无法就此达成共识,并提出 `DirEntry` 接口。

What this means is that, to get the performance gain, `filepath.Walk` calls need to be changed to [`filepath.WalkDir`](https://tip.golang.org/pkg/path/filepath/#WalkDir) – very similar, but the walk function receives a `DirEntry` instead of a `FileInfo`.
这表明,为了获得性能提升, `filepath.Walk` 的调用需要改成 [`filepath.WalkDir`](https://tip.golang.org/pkg/path/filepath/#WalkDir) ——尽管非常相似,但遍历函数的参数是`DirEntry` 而不是 `FileInfo`

Here’s what a Go version of `list_non_dot` would look like with the existing `filepath.Walk` function:
下面的代码是Go版本的使用现有`filepath.Walk `函数的`list_non_dot`

```
func ListNonDot(path string) ([]string, error) {
Expand All @@ -176,18 +173,18 @@ func ListNonDot(path string) ([]string, error) {
}
```

This will keep working in Go 1.16, of course, but if you want the performance benefits you’ll have to make some very small changes – in this case just changing `Walk` to `WalkDir`, and changing `os.FileInfo` to `os.DirEntry`:
当然,在Go 1.16中这段代码也可以运行,但如果你想得到性能收益就需要做少许修改——在上面的代码中仅需要把 `Walk` 替换为 `WalkDir`,并把 `os.FileInfo` 替换成 `os.DirEntry`

```
err := filepath.WalkDir(path, func(p string, info os.DirEntry,
```

For what it’s worth, running the first function on my home directory on Linux, once cached, takes about 580ms. The new version using Go 1.16 takes about 370ms – roughly 1.5x as fast. Not a huge difference, but worth it – and you get much larger speed-ups on networked file systems and on Windows.
对于这么修改的价值,在我的Linux home文件夹下运行第一个函数,在缓存后花费约580ms。使用Go 1.16中的新版本花费约370ms——差不多快了1.5倍。差异并不大,但也是有意义的——在网络文件系统和Windows下将会得到更多的加速效果。

## Summary
## 总结

The new `ReadDir` API is easy to use, and integrates nicely with the new file system interface via `fs.ReadDir`. And to speed up your existing `Walk` calls, the tweaks you’ll need to make to switch to `WalkDir` are trivial.
新的`ReadDir` API易于使用,通过 `fs.ReadDir`可以便捷地集成新的文件系统。相比于加速现有的`Walk`调用,你所需要替换成`WalkDir`的改动微不足道。

API design is hard. Cross-platform, OS-related API design is even harder. Be sure to get this right when designing your next programming language’s standard library! :-)
API 的设计非常难。跨平台、操作系统相关的API设计更加困难。希望你在设计下一个编程语言的标准库时可以设计正确! :-)

In any case, I’m glad that Go’s support for reading directories will no longer be lagging behind – or _walking_ behind – Python.
无论如何,我很开心可以看到Go对于文件夹读取的支持将不在落后——或者说_努力_紧追——Python

0 comments on commit bcf4c5f

Please sign in to comment.