-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure the DB file is seekable, use a StringIO wrapper if it's not #38
Conversation
Looks like an ok approach. @jeremy how do you feel about this? adds two Can we get away with a slightly less paranoid implementation? (seek 1 (if that fails) not seekable, if tell gives me anything that is not 1 then it is not seekable) Tests are failing so something is off. @headius FYI |
That was actually more or less what I started with, but I figured one extra tell wouldn't hurt. AIUI tell is supposed to be a fast operation anyway. And since we're reading the first line anyway, I could actually test a real backwards seek after read for basically free. (Now, if I was actually paranoid, I'd
😬 Not much to go on in the test logs, but let me do some local testing and try to see what's going wrong. |
In local testing, I'm getting a failure in |
.webmanifest was merged in #36, so you probably just need a rebase. |
199e17b
to
5ffcd8b
Compare
Agreed; reasonable.
…On Sun, Aug 22, 2021 at 19:28 Sam ***@***.***> wrote:
Looks like an ok approach. @jeremy <https://github.com/jeremy> how do you
feel about this? adds two tell calls and one seek, seems reasonable in
the big scheme.
Can we get away with a slightly less paranoid implementation? (seek 1 (if
that fails) not seekable, if tell gives me anything that is not 1 then it
is not seekable)
Tests are failing so something is off.
@headius <https://github.com/headius> FYI
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#38 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAABR3TFJBDNUQFXLO54FLT6GW4PANCNFSM5CQKX3DA>
.
|
It's been over a year. Could someone please review this PR? *nudge, nudge* We'd like to stop using a private fork of this gem but we still need this fix merged in order to do so. 🙄 |
assert_equal "application/vnd.groove-tool-message", MiniMime.lookup_by_filename("a.GTM").content_type | ||
assert_equal "application/zip", MiniMime.lookup_by_extension("ZiP").content_type | ||
assert_equal "application/vnd.groove-tool-message", MiniMime.lookup_by_filename("a.GTM")&.content_type | ||
assert_equal "application/zip", MiniMime.lookup_by_extension("ZiP")&.content_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm I don't understand these changes ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just making sure that if the lookup fails and returns nil
we get a clean failure from the assert_equals
instead of a NoMethodError from calling .content_type
on nil
. I think I ran into that while testing my code.
@@ -102,6 +102,16 @@ def initialize(path, sort_order) | |||
@file_length = File.size(@path) | |||
@rows = @file_length / @row_length | |||
|
|||
# Make sure the DB file is seekable, use a StringIO wrapper if it's not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole dance of checking makes me uneasy... I would be comfortable with a "reaction" to an error JRuby raises when you do a seek but it gives us nothing. I prefer not to own this heuristic.
Perhaps a simple:
MiniMime.preload_dbs = true
Then
if MiniMime.preload_dbs
@file = StringIO.new(File.read(@path))
end
With some readme data. Very uneasy with the hack of ... seek around file and check that the computer did what you asked it... as a general solution. @headius - it should be exceptioning out if you tell it to seek and it does not seek, this is hiding bugs in JRuby and making them "normal".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be exceptioning out if you tell it to seek and it does not seek
I completely agree. However, if we want this code to be compatible with current and past JRuby behavior, we'd still need this "hack". At least the way I wrote it, it should continue working even if JRuby behavior is later changed to raise a reasonable exception (i.e. any IOError or SystemCallError) in this situation.
I suppose a comment explaining why this "dance" is needed might be worth including.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not do anything special for seek
. It will either use JVM functions to do this (SeekableChannel.position) or a native lseek
call.
It does appear to error for an IO.pipe
, for example, but I'm unsure if it is the right error:
$ jruby -e 'IO.pipe[0].seek(0)'
Errno::EPIPE: Broken pipe - No message available
seek at org/jruby/RubyIO.java:1817
<main> at -e:1
Also for stdin, with a similar error:
$ jruby -e '$stdin.seek(0)'
Errno::ESPIPE: Illegal seek - <STDIN>
seek at org/jruby/RubyIO.java:1817
<main> at -e:1
I am a bit confused which resources are not seekable but not raising an error. Just classloader resources? Some other file type for the DB? I think we can extend this error raising to those other resources types if someone can clarify which ones are failing to raise an error here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a JRuby patch that will raise EPIPE for all streams that do not appear to be seekable, including classloader resources:
[] ~/work/jruby93 $ rvm ruby-3.1 do ruby -e 'IO.pipe[0].seek(0)'
-e:1:in `seek': Illegal seek (Errno::ESPIPE)
from -e:1:in `<main>'
[] ~/work/jruby93 $ rvm ruby-3.1 do ruby -e '$stdin.seek(0)'
-e:1:in `seek': Illegal seek @ rb_io_seek - <STDIN> (Errno::ESPIPE)
from -e:1:in `<main>'
[] ~/work/jruby93 $ jruby -e 'p File.open("uri:classloader:jruby/kernel/kernel.rb").seek(1)'
Errno::EPIPE: Broken pipe - uri:classloader:jruby/kernel/kernel.rb
seek at org/jruby/RubyIO.java:1817
<main> at -e:1
This is a pretty aggressive patch, but might be necessary to make this work properly. By the time we do this seek call, all we have in hand is a java.nio.Channel, and we use the presence of the SeekableChannel supertype to know if seeking is possible. We opted for quiet failure at some point in the past (no idea why) if a channel was not seekable but also not selectable.
I have pushed this as jruby/jruby#7415 but we'll need to evaluate how risky it is before merging and releasing.
cc @enebo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused which resources are not seekable but not raising an error. Just classloader resources?
Classloader resources are what I've encountered this issue with, since that's what the DB files will end up being when a JRuby application using MiniMime is bundled into a .jar.
We opted for quiet failure at some point in the past (no idea why) if a channel was not seekable but also not selectable.
The silent failure was apparently originally implemented in jruby/jruby@a4cc5ee as a workaround for jruby/jruby#3399. I don't personally think it was a good workaround, but if this behavior is changed, we should probably make sure there's no regression there. (What probably should be done, IMO, would be to fix Rack to properly handle non-seekable resources.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented on my PR as well, but to keep this up-to-date...
I agree that classloader resources, accessible only as InputStream objects that don't support seeking, should raise an error when you attempt to seek on them. If we can be sure this won't affect rack, it is an easy change (and we could narrow my PR to just the resource types in question). Until we can be sure of that, however, we are potentially trading one "broken" behavior for another "broken" behavior.
The simplest fix would be to make all unseekable resources in JRuby raise an error, but there are many resource types in JRuby that would never be encountered in CRuby such as IO resources from in-memory resources that can only be read forward. It is an arguable grey area what is appropriate to do for those.
This relates to the discussion in discourse/mini_mime#38 where the proposed patch must work around JRuby's silent seek failure for some types of resources (at least classloader resources). The patch here will now raise EPIPE for all NIO channels that are not seekable through some means. It is rather aggressive given the old logic that only raised if a channel was not seekable AND not selectable (for reasons long lost).
When forking, file descriptors are inherited and their state shared. In the context of MiniMime this means that the offset of the file opened by RandomAccessDb is shared across processes, so the `seek + read` combo is subject to inter-process race conditions. Of course that file is lazily opened, so assuming most applications don't query MiniMime before fork, it's not a big problem. However when reforking post boot (e.g. https://github.com/Shopify/pitchfork) this becomes an issue. Additionally, even if the file descriptor isn't shared across processes, the file position is still process global requiring a Mutex. By using `pread` instead of `seek + read` we can both make the library fork safe and get rid of the need to synchronize accesses. This also happens to fix an outstanding JRuby issue. Fix: discourse#37 Fix: discourse#38
When forking, file descriptors are inherited and their state shared. In the context of MiniMime this means that the offset of the file opened by RandomAccessDb is shared across processes, so the `seek + read` combo is subject to inter-process race conditions. Of course that file is lazily opened, so assuming most applications don't query MiniMime before fork, it's not a big problem. However when reforking post boot (e.g. https://github.com/Shopify/pitchfork) this becomes an issue. Additionally, even if the file descriptor isn't shared across processes, the file position is still process global requiring a Mutex. By using `pread` instead of `seek + read` we can both make the library fork safe and get rid of the need to synchronize accesses. This also happens to fix an outstanding JRuby issue. Fix: discourse#37 Fix: discourse#38
When forking, file descriptors are inherited and their state shared. In the context of MiniMime this means that the offset of the file opened by RandomAccessDb is shared across processes, so the `seek + read` combo is subject to inter-process race conditions. Of course that file is lazily opened, so assuming most applications don't query MiniMime before fork, it's not a big problem. However when reforking post boot (e.g. https://github.com/Shopify/pitchfork) this becomes an issue. Additionally, even if the file descriptor isn't shared across processes, the file position is still process global requiring a Mutex. By using `pread` instead of `seek + read` we can both make the library fork safe and get rid of the need to synchronize accesses. This also happens to fix an outstanding JRuby issue. Fix: #37 Fix: #38
This (hopefully) fixes #37
What has been tested:
is_seekable
check always fails.What has not (yet) been tested:
uri:classloader
resource in a bundled .jar, withseek
failing silently?seek
might raise an exception?While I'm 99% confident that the answer to both questions is yes, I'd still like to do a bit more testing. But I wanted to get this PR out for review before the weekend.
(Also, I'm not sure if there are any potential newline and/or UTF-8 conversion issues on some platforms that might require opening the DB files explicitly in binary mode. If there were, I suspect the existing code would already be buggy on such platforms, since it's already seeking around in a file opened in text mode, but I'm still slightly worried and would like to see this tested better.)