-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename and rework File::Stat as File::Info #5584
Conversation
src/crystal/system/unix/file.cr
Outdated
flags |= ::File::Flags::SetGroup if (stat.st_mode & LibC::S_ISGID) != 0 | ||
flags |= ::File::Flags::Sticky if (stat.st_mode & LibC::S_ISVTX) != 0 | ||
|
||
modification_time = ::Time.new(stat.st_mtim, ::Time::Kind::Utc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should better use ::Time.utc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
st_mtim
is a LibC::Timespec
. You can't use Time.utc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, nvm
030495b
to
148b1f6
Compare
src/file.cr
Outdated
@@ -16,7 +16,7 @@ class File < IO::FileDescriptor | |||
{% end %} | |||
|
|||
# :nodoc: | |||
DEFAULT_CREATE_MODE = LibC::S_IRUSR | LibC::S_IWUSR | LibC::S_IRGRP | LibC::S_IROTH | |||
DEFAULT_CREATE_PERMISSIONS = 0o644 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is better to use File::Permissions.flags(OwnerRead, OwnerWrite, GroupRead, OtherRead)
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, the shorter octal syntax is far easier to read. "everyone" knows unix octal permission syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then why bother with a File::Permissions
enum?
# The binary representation of this enum is defined to be same representation | ||
# as the permission bits of a unix `st_mode` field. `File::Permissions` | ||
# can also be compared to it's underlying bitset, for example | ||
# `File::Permissions::All == 0o777` will always be `true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the opposite? Does 0o777 == File::Permissions::All
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, maybe always
isn't necessary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah. Perhaps this equality was a bad idea. Or should I just reopen struct Int
? It's a bit ugly how these equalities are defined in two places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome! And much cleaner than before. Plus the inspect of permissions is really nice. Thank you!
@asterite yeah, while debugging I found that I really wanted to be able to see the permissions better so I just implemented it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of always evaluating and copying to populate a generic struct with lots of data —which misses information btw, such as the creation time for the most obvious one— we could keep the simple wrapper over C structs.
If all I need is accessing the mtime for thousands of files, I'd like to avoid copying data for each and every file, or just checking over and over if it's a file, symlink or something else (pointless).
src/file/info.cr
Outdated
|
||
# :nodoc: | ||
def initialize(@size, @permissions, @type, @flags, @modification_time, @owner, @group) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It lost query methods, such as symlink?
, file?
, ... which is an unwelcomed change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just type.file?
now. The functionality is still there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could think about forwarding these methods to type
similar to the weekday names in Time
. Allthough this would effectively be aliases. But the API would be simpler that way. You shouldn't have to write type
in between to ask a File::Info
if it describes a symlink.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? It's really just some extra typing and it means you have a clear separation between permissions, types, and flags on files. Which I think is really nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your argument. But I think that's too bulky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use delegate
here?
@ysbaddaden Information has been intentionally left out to make it portable. I'll benchmark old stat vs new stat but i'm not sure that a bit of bit twiddling is going to make much difference when put next to the price of a syscall, and making it wrap |
Benchmarks say it doesn't matter much. I'd definitely say it's not worth optimizing for a while. old:
new:
|
Also keep in mind that benchmark is a best-case for stat: one file only in a loop. When you add in a typical usage where the stat may not be in cache all the time you're going to see far less than the 2-4% overhead seen here. |
it "can take File::Permissions" do | ||
path = "#{__DIR__}/data/chmod.txt" | ||
begin | ||
File.write(path, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use File.touch(path)
here I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's copied from the spec two above it with minor tweaks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh ok, probably not worth to change it here then
I do wonder whether we should rename Also I'm probably going to just rename |
I actually learned in this PR that I believe Ruby started with "works only in unix" and eventually had to keep all those cryptic, unix-specific names. Go for example doesn't do this, it provides a clear interface to the computer without using ancient strange names. I think we should do the same, even if it means a bigger barrier to those coming from Ruby. |
@asterite its called stat and lstat on go too... |
Maybe because go isn't that flexible with method names. |
@asterite I do agree it should be |
148b1f6
to
7c874b8
Compare
Updated this PR to take into account the above suggestions. |
Looks like I accidentally made this PR depend on #5619... I'll fix it tomorrow. |
7c874b8
to
f8e9bd5
Compare
Pushed again to fix this, this can be re-reviewed now. |
f8e9bd5
to
d9f73d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry I'll feel grumpy, but I still have goosebumps, and no explanation for many (unwelcomed) changes:
Why change stat
naming to info
? it's still an abbreviation and doesn't tell any better.
Why decode & copy everything (check each file type, decode/copy permissions, generate Time instances) instead of using a wrapper for on-demand generation? Sure File::Stat
would be a mere delegation struct for the underlying POSIX vs Windows implementations, mostly to hold documentation, but the optimizer will inline calls, or we can just flag them with @[AlwaysInline]
.
Having to type .info.type.directory?
over .stat.directory?
can feel minor, but that's still an unwelcomed hick; forward calls from the struct to the enum, please.
Also:
- ctime & atime are still missing (!)
src/file/info.cr
Outdated
end | ||
|
||
struct Int | ||
def ==(other : File::Permissions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure reopening Int
is a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the alternative is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not support Int==(File::Permissions)
nor File::Info#permissions==(Int)
. You must wrap the octal value. I believe it's better than reopening Int for a corner case —checking exact permissions equality doesn't seem like a usual case.
I've added some shortcuts for |
I don't like delegating only a few type predicates. It should be either all (my preference) or none to provide the least surprise. |
I prefer having the most common ones than having all / nothing. IMHO this is good to go as it is 👍 |
@ysbaddaden After looking at crystal-lang/shards#203 I'm wondering if making |
Maybe |
I don't think we should expose inode and device numbers (nicely). I'd much rather just expose samefile. |
Given the following files:
With the stdlib at master we currently have: a = File.new("./a.txt")
a_copy = File.new("./a.txt") How to check if a1 and a1_copy refer to the same file? a == a_copy # => false
a.info == a_copy.info # => true Ok, but how that works with b? b = File.new("./b.txt")
a.info == b.info # => true How can I know that a and b are different files? File.info?(a.path, follow_symlinks: false) == File.info?(a_copy.path, follow_symlinks: false) # => true
File.info?(a.path, follow_symlinks: false) == File.info?(b.path, follow_symlinks: false) # => false Ok, but what about if I use only the path of the a_diff = File.new("a.txt")
a.path # => "./a.txt"
a_diff.path # => "a.txt" You can use File.expand_path(a.path) # => /full/path/to/a.txt
File.expand_path(a_diff.path) # => /full/path/to/a.txt So, what about?
That would make simpler all of the above scenarios with |
Because |
They are the same file. The whole premise for that question is wrong on posix. The confusing part here is actually storing You can unlink a path from the filesystem while it is open in a program just fine. And any open programs will keep the inode around in the filesystem, and the file will continue to use up disk space until the program is closed. Then Path equality and file equality are entirely different things, and are pretty much entirely unrelated and unreliable until the moment in time you call |
Ok, so @RX14 you say to only
I'm fine with that also. Although the premise for the question might seems wrong, but I think it depends on the use case. If someone wants to deal with lower level of file system some api are not the best. But is not the more common use case and it is still doable. |
Mixing symlinks with hard links to detect file identify is IMHO introducing confusing behavior. Furthermore, detecting hard links is quite specific, and not widely used. We don't need a abstraction for this, hence why I'm proposing to keep exposing inodes on POSIX systems and be done with it. So:
Simple, does the job, no confusing behavior, and allows finer grained detection. All it needs is to expose a value that we already have. |
@bcardiff having a @ysbaddaden the reason why I want to only expose equality only is because your examples are wrong! You must compare device and inode to be sure they're the same file. If even you get this wrong, then surely thats a fantastic reason to expose this nicely instead of just expose the platform-specific inode (which won't even exist on windows). Let's add |
PR sent for this. I don't think we need to expose inode for file comparison in order to stay posix independent for this api. Users are free to go to info and extract more information, but for comparison File.same? is good API. |
My wrong, you're both right. Let's go with |
I've aimed to make
File::Info
easier to use, more portable, and nicer thanFile::Stat
.