Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename and rework File::Stat as File::Info #5584

Merged
merged 3 commits into from
Apr 14, 2018

Conversation

RX14
Copy link
Contributor

@RX14 RX14 commented Jan 13, 2018

I've aimed to make File::Info easier to use, more portable, and nicer than File::Stat.

flags |= ::File::Flags::SetGroup if (stat.st_mode & LibC::S_ISGID) != 0
flags |= ::File::Flags::Sticky if (stat.st_mode & LibC::S_ISVTX) != 0

modification_time = ::Time.new(stat.st_mtim, ::Time::Kind::Utc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should better use ::Time.utc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

st_mtim is a LibC::Timespec. You can't use Time.utc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, nvm

@RX14 RX14 force-pushed the feature/file-info branch 2 times, most recently from 030495b to 148b1f6 Compare January 13, 2018 23:03
src/file.cr Outdated
@@ -16,7 +16,7 @@ class File < IO::FileDescriptor
{% end %}

# :nodoc:
DEFAULT_CREATE_MODE = LibC::S_IRUSR | LibC::S_IWUSR | LibC::S_IRGRP | LibC::S_IROTH
DEFAULT_CREATE_PERMISSIONS = 0o644
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is better to use File::Permissions.flags(OwnerRead, OwnerWrite, GroupRead, OtherRead) here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, the shorter octal syntax is far easier to read. "everyone" knows unix octal permission syntax.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then why bother with a File::Permissions enum?

# The binary representation of this enum is defined to be same representation
# as the permission bits of a unix `st_mode` field. `File::Permissions`
# can also be compared to it's underlying bitset, for example
# `File::Permissions::All == 0o777` will always be `true`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the opposite? Does 0o777 == File::Permissions::All work?

Copy link
Member

@asterite asterite Jan 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, maybe always isn't necessary here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah. Perhaps this equality was a bad idea. Or should I just reopen struct Int? It's a bit ugly how these equalities are defined in two places.

Copy link
Member

@asterite asterite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome! And much cleaner than before. Plus the inspect of permissions is really nice. Thank you!

@RX14
Copy link
Contributor Author

RX14 commented Jan 14, 2018

@asterite yeah, while debugging I found that I really wanted to be able to see the permissions better so I just implemented it!

Copy link
Contributor

@ysbaddaden ysbaddaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of always evaluating and copying to populate a generic struct with lots of data —which misses information btw, such as the creation time for the most obvious one— we could keep the simple wrapper over C structs.

If all I need is accessing the mtime for thousands of files, I'd like to avoid copying data for each and every file, or just checking over and over if it's a file, symlink or something else (pointless).

src/file/info.cr Outdated

# :nodoc:
def initialize(@size, @permissions, @type, @flags, @modification_time, @owner, @group)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It lost query methods, such as symlink?, file?, ... which is an unwelcomed change.

Copy link
Contributor Author

@RX14 RX14 Jan 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just type.file? now. The functionality is still there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could think about forwarding these methods to type similar to the weekday names in Time. Allthough this would effectively be aliases. But the API would be simpler that way. You shouldn't have to write type in between to ask a File::Info if it describes a symlink.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? It's really just some extra typing and it means you have a clear separation between permissions, types, and flags on files. Which I think is really nice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your argument. But I think that's too bulky.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use delegate here?

@RX14
Copy link
Contributor Author

RX14 commented Jan 14, 2018

@ysbaddaden Information has been intentionally left out to make it portable. atime is disabled on most modern distros, either with relatime or noatime mount options (and nobody uses it) and ctime is usually not what people want or think it is (they want btime which basically no filesystems support). mtime is really the only widely supported and actually useful time. Go only provides modification time too.

I'll benchmark old stat vs new stat but i'm not sure that a bit of bit twiddling is going to make much difference when put next to the price of a syscall, and making it wrap LibC::Stat would move platform-specifics into the definition of File::Info instead of just the construction. Keep in mind that on windows I'm not going to be using LibC::Stat at all, but recostructing the same information from GetFileInformationByHandle.

@RX14
Copy link
Contributor Author

RX14 commented Jan 14, 2018

Benchmarks say it doesn't matter much. I'd definitely say it's not worth optimizing for a while.

old:

 File.stat   1.77M (565.92ns) (± 4.62%)  0 B/op   1.77× slower
File.lstat   1.76M (568.21ns) (± 4.58%)  0 B/op   1.78× slower
 File#stat   3.13M (319.46ns) (± 3.19%)  0 B/op        fastest

new:

 File.stat   1.74M (575.88ns) (± 4.60%)  0 B/op   1.79× slower
File.lstat   1.69M (590.26ns) (±10.11%)  0 B/op   1.84× slower
 File#stat   3.11M (321.26ns) (± 3.22%)  0 B/op        fastest

@RX14
Copy link
Contributor Author

RX14 commented Jan 14, 2018

Also keep in mind that benchmark is a best-case for stat: one file only in a loop. When you add in a typical usage where the stat may not be in cache all the time you're going to see far less than the 2-4% overhead seen here.

it "can take File::Permissions" do
path = "#{__DIR__}/data/chmod.txt"
begin
File.write(path, "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use File.touch(path) here I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's copied from the spec two above it with minor tweaks.

Copy link
Contributor

@bew bew Jan 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ok, probably not worth to change it here then

@RX14
Copy link
Contributor Author

RX14 commented Jan 14, 2018

I do wonder whether we should rename File.stat? as File.info? and have File.info?(follow_symlink: false).

Also I'm probably going to just rename File::Type::SymbolicLink to File::Type::Symlink, almost everyone uses the shortened name these days.

@asterite
Copy link
Member

I actually learned in this PR that lstat doesn't (or does) follow symlinks, so a named argument to select that behavior is better in my opinion. lstat is too cryptic. So maybe it's better to just use info.

I believe Ruby started with "works only in unix" and eventually had to keep all those cryptic, unix-specific names. Go for example doesn't do this, it provides a clear interface to the computer without using ancient strange names. I think we should do the same, even if it means a bigger barrier to those coming from Ruby.

@RX14
Copy link
Contributor Author

RX14 commented Jan 15, 2018

@asterite its called stat and lstat on go too...

@asterite
Copy link
Member

Maybe because go isn't that flexible with method names.

@RX14
Copy link
Contributor Author

RX14 commented Jan 15, 2018

@asterite I do agree it should be File.info. (eventually Path#info and File#info for stat/lstat and fstat respectively)

@RX14
Copy link
Contributor Author

RX14 commented Jan 21, 2018

Updated this PR to take into account the above suggestions.

@RX14
Copy link
Contributor Author

RX14 commented Jan 21, 2018

Looks like I accidentally made this PR depend on #5619... I'll fix it tomorrow.

@RX14
Copy link
Contributor Author

RX14 commented Jan 21, 2018

Pushed again to fix this, this can be re-reviewed now.

Copy link
Contributor

@ysbaddaden ysbaddaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry I'll feel grumpy, but I still have goosebumps, and no explanation for many (unwelcomed) changes:

Why change stat naming to info? it's still an abbreviation and doesn't tell any better.

Why decode & copy everything (check each file type, decode/copy permissions, generate Time instances) instead of using a wrapper for on-demand generation? Sure File::Stat would be a mere delegation struct for the underlying POSIX vs Windows implementations, mostly to hold documentation, but the optimizer will inline calls, or we can just flag them with @[AlwaysInline].

Having to type .info.type.directory? over .stat.directory? can feel minor, but that's still an unwelcomed hick; forward calls from the struct to the enum, please.

Also:

  • ctime & atime are still missing (!)

src/file/info.cr Outdated
end

struct Int
def ==(other : File::Permissions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure reopening Int is a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the alternative is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not support Int==(File::Permissions) nor File::Info#permissions==(Int). You must wrap the octal value. I believe it's better than reopening Int for a corner case —checking exact permissions equality doesn't seem like a usual case.

@RX14
Copy link
Contributor Author

RX14 commented Apr 13, 2018

I've added some shortcuts for file? directory? and symlink?, as they are the common options. I've also fixed DEFAULT_CREATE_PERMISSIONS. Looking forward to finally getting this merged!

@straight-shoota
Copy link
Member

I don't like delegating only a few type predicates. It should be either all (my preference) or none to provide the least surprise.

@sdogruyol
Copy link
Member

I prefer having the most common ones than having all / nothing. IMHO this is good to go as it is 👍

@sdogruyol sdogruyol merged commit 96bcc34 into crystal-lang:master Apr 14, 2018
@sdogruyol sdogruyol added this to the Next milestone Apr 14, 2018
@RX14
Copy link
Contributor Author

RX14 commented May 29, 2018

@ysbaddaden After looking at crystal-lang/shards#203 I'm wondering if making File::Info#== perform samefile was correct. Perhaps it should have been File.info(foo).same_file? File.info(bar)? File.same_file?(foo, bar)? I'd like other people's input on this API.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented May 29, 2018

Maybe File.same?(foo, bar) or just keep exposing the inode on POSIX systems.

@RX14
Copy link
Contributor Author

RX14 commented May 29, 2018

I don't think we should expose inode and device numbers (nicely). I'd much rather just expose samefile.

@bcardiff
Copy link
Member

File.same?(foo, bar) LGTM, but I would propose something else

Given the following files:

$ touch a.txt
$ ln -s a.txt b.txt
$ touch c.txt

With the stdlib at master we currently have:

a = File.new("./a.txt") 
a_copy = File.new("./a.txt") 

How to check if a1 and a1_copy refer to the same file?

a == a_copy # => false
a.info == a_copy.info # => true

Ok, but how that works with b?

b = File.new("./b.txt")
a.info == b.info # => true

How can I know that a and b are different files?

File.info?(a.path, follow_symlinks: false) == File.info?(a_copy.path, follow_symlinks: false) # => true
File.info?(a.path, follow_symlinks: false) == File.info?(b.path, follow_symlinks: false) # => false

Ok, but what about if I use only the path of the File?
File instances initialized with different path indeed have different path

a_diff = File.new("a.txt")
a.path      # => "./a.txt"
a_diff.path # => "a.txt"

You can use File.expand_path, but there are a lot of rules to expand the
path and we had already a File instance

File.expand_path(a.path)      # => /full/path/to/a.txt
File.expand_path(a_diff.path) # => /full/path/to/a.txt

So, what about?

  1. Changing File#info to File#info(follow_symlinks = true)
  2. Add File.same?(file1, file2, follow_symlinks = true)
  3. Remove override of File::Info==

That would make simpler all of the above scenarios with File.same? having follow_symlinks set to one or another value.

@RX14
Copy link
Contributor Author

RX14 commented May 30, 2018

Because file.info(follow_symlinks: false) doesn't make sense. file.info != File.info(file.path). file.info is fstat and File.info is stat/lstat.

@RX14
Copy link
Contributor Author

RX14 commented May 30, 2018

How can I know that a and b are different files?

They are the same file. The whole premise for that question is wrong on posix. The confusing part here is actually storing File#path in the first place. File#path is never actually guaranteed to mean anything. Not all files even have a (valid) path.

You can unlink a path from the filesystem while it is open in a program just fine. And any open programs will keep the inode around in the filesystem, and the file will continue to use up disk space until the program is closed. Then File#path will continue to report the path of the deleted file where it may well have been replaced by a completely different file. That's part of the reason why fstat exists: because open is atomic and then you can call fstat on the file descriptor to check permissions, then perform operations with the FD while being sure that an attacker hasn't raced you to swap out the file for another malicious one under your nose.

Path equality and file equality are entirely different things, and are pretty much entirely unrelated and unreliable until the moment in time you call open.

@bcardiff
Copy link
Member

Ok, so @RX14 you say to only

  1. Add File.same?(file1, file2) that will always "follow symlinks" in the comparison
  2. Remove override of File::Info==
  3. Treat path comparison in another way. Later.

I'm fine with that also.

Although the premise for the question might seems wrong, but I think it depends on the use case. If someone wants to deal with lower level of file system some api are not the best. But is not the more common use case and it is still doable.

@ysbaddaden
Copy link
Contributor

Mixing symlinks with hard links to detect file identify is IMHO introducing confusing behavior. Furthermore, detecting hard links is quite specific, and not widely used. We don't need a abstraction for this, hence why I'm proposing to keep exposing inodes on POSIX systems and be done with it.

So:

  • detecting hard links: File.info(a).inode == File.info(b).inode;
  • detecting symlinks: File.real_path(a) == File.real_path(b);
  • detecting hard links through potential symlinks:
    File.info(a, follow_symlinks: true).inode == File.info(b, follow_symlinks: true).inode.

Simple, does the job, no confusing behavior, and allows finer grained detection. All it needs is to expose a value that we already have.

@RX14
Copy link
Contributor Author

RX14 commented May 31, 2018

@bcardiff having a follow_symlinks in File.same is fine because symlinks are a property of a path, which File.same is working with. It's not fine to have File#info with a follow_symlinks option because you've already dereferenced the symlink and got to a specific device and inode when you call open(2).

@ysbaddaden the reason why I want to only expose equality only is because your examples are wrong! You must compare device and inode to be sure they're the same file. If even you get this wrong, then surely thats a fantastic reason to expose this nicely instead of just expose the platform-specific inode (which won't even exist on windows).

Let's add File.same with a follow_symlinks option, then move File::Info#== to File#== so it's just file == file and skip the .info step (it reads badly and is a bit weird).

@bcardiff
Copy link
Member

bcardiff commented Jun 4, 2018

PR sent for this. I don't think we need to expose inode for file comparison in order to stay posix independent for this api.

Users are free to go to info and extract more information, but for comparison File.same? is good API.

@ysbaddaden
Copy link
Contributor

My wrong, you're both right. Let's go with File.same?.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.