Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seperate Socket and FileDescriptor IO #4707

Merged
merged 3 commits into from
Aug 30, 2017
Merged

Conversation

RX14
Copy link
Contributor

@RX14 RX14 commented Jul 12, 2017

On some platforms - notably windows - socket descriptors are different from file
descriptors so it makes little sense for them to be shared under a common hierarchy.

This is done in 3 stages:

  • Firstly I refactored out the common logic for dealing with nonblocking IO using read/write like functions from IO::FileDescriptor into IO::Syscall. It takes care of checking for EAGAIN and managing the lists of waiting fibers.
  • Then, I removed the edge_triggerable option from IO::FileDescriptor, it doesn't seem to be used anywhere and it seems to me like it would cause hangs and other weird behaviour if used.
  • Finally, I changed Socket not to inherit IO::FileDescriptor. This involved quite a lot of copy-pasting code, which seems like a step backwards, but it means we can refactor File and Socket to use Crystal::System separately. Once these refactors are done both File and Socket class hierarchies will likely end up much cleaner.

src/socket.cr Outdated
# see IO::FileDescriptor#unbuffered_write
if (writers = @writers) && !writers.empty?
add_write_event
write_syscall_helper(slice, "Error sending message") do |slice|
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very sure this behaviour is correct, as write_syscall_helper retries send. This looks correct for stream-type sockets but not for datagram-type sockets, but send/recv are used for both. However, datagram-type sockets still implement IO so i'm not sure what to do. Socket in general feels like a messy part of the stdlib to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It musn't retry. Send with either succeed to send the whole messages, or fail and raise an exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise that it's way beyond the scope of this PR but I really think we should have seperate types for datagram and stream-type sockets. The send syscall is basically overloaded to do 2 different things: copy data to the kernel's output buffer for stream sockets and send a message for datagram sockets. I think it's a mistake to confuse these by putting it all under one type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this by simply not touching the "message" parts of Socket, and the diff is a lot cleaner as a result.

@ysbaddaden
Copy link
Contributor

I hardly see a benefit in this PR for the time being (e.g. code duplication), and how it will simplify integrating WinSock (which are BSD sockets, thus use file descriptors, unlike windows files).

Calls to read and write (generic fd calls) are also kept in Socket, not the abstraction... but they're impossible on windows: there are no generic fds, only sockets so it has special calls (AFAIK).

@RX14
Copy link
Contributor Author

RX14 commented Jul 12, 2017

@ysbaddaden The point is so that Socket no longer has any dependency on read and write, because winsock uses only send and recv. It also allows you to refactor File and Socket IO seperately. This PR is a result of me refactoring IO::FileDescriptor to use Crystal::System which maintaining Socket < IO::FileDescriptor. The resulting PR is really really messy on the Socket side so I thought i'd PR this first. You can see that branch here if you want to see my rationale.

@sempervictus
Copy link

There's a Ruby library called rex-socket which deals with some of the concerns here. It's effectively a set of abstractions we use for dealing with remote networking requirements and unconventional activities (a core part of Metasploit). The approach uses Ruby modules to define common functionality and apply in class composition via mixins (sometimes via extend because we pull from Ruby's stdlib c primitives). This really helps reduce redundancy as only the differentiated parts are written in the class itself. Over time creep will occur, but beats copy pasta everywhere and provides consistent interfaces and coding convention for a critical and touchy part of any language.
Gram, nix, netlink, etc sockets are very different from stream socks like TCP. While users may want convenience around them to make them feel familiar, that sort of thing belongs outside a standard library (IMHO).
Windows handles are definitely an oddity, so if multi platform support is the intent, its probably best to disambiguate the difference for developers. Nobody wants a repeat of Ruby on Windows :).

@RX14 RX14 force-pushed the feature/separate-socket-io branch 2 times, most recently from 3bc1658 to 6429494 Compare July 13, 2017 19:51
@RX14
Copy link
Contributor Author

RX14 commented Jul 13, 2017

Looks to me like Travis now doesn't have enough memory to reliably compile all_spec. There have been quite a few failed builds due to fork: Cannot allocate memory recently (even on master).

@akzhan
Copy link
Contributor

akzhan commented Jul 13, 2017

What prevents to run specs separately through find | xargs bin/crystal? With -P4 for CircleCI.

@RX14
Copy link
Contributor Author

RX14 commented Jul 13, 2017

@akzhan the spec output will be almost too ugly to read and it'll be a lot less efficient in terms of compilation time. Probably offset by having multiple cores available though.

time make spec reports 9 minutes and 23 seconds on my quad core i5 desktop. find spec/compiler spec/std -type f -name '*_spec.cr' | time parallel env CRYSTAL_CACHE_DIR=/tmp/crystal_cache_{%} bin/crystal spec {} reports 28 minutes 53 seconds. Clearly this isn't a good idea, the extra compilation work doesn't even nearly benefit from using all of 4 cores.

@funny-falcon
Copy link
Contributor

may be set --threads 1 option for compiler?

@funny-falcon
Copy link
Contributor

looks like --threads 1 helped. See #4675

@RX14 RX14 mentioned this pull request Jul 29, 2017
24 tasks
@bcardiff
Copy link
Member

Regarding the edge_triggerable. It was added by a0820c7 to fix something in darwin but it's usage seems to got lost at fe3ccc5 maybe the timeouts work as an alternative solution for the issue in darwin #706 .

I think is good to remove the inheritance between Socket and FileDescriptor, but I would feel more comfortable knowing what is the implementation needed for windows in order to judge if the abstraction is appropriate.

@RX14
Copy link
Contributor Author

RX14 commented Jul 30, 2017

Memory issues on travis also need to be resolved before this can be merged and I have no idea how to proceed on that.

@bcardiff maybe I should work further into the future and attempt to port windows on top of this PR + abstraction IO to Crystal::System. The danger however is that this PR gets rejected and all that work is wasted. Thoughts?

@sdogruyol
Copy link
Member

Seems like CircleCI is happy. LGTM?

@mverzilli
Copy link

@RX14 can you resync with master now that Ary split the CI runnables? Let's move this forward. I wouldn't be too shy about finding the perfect abstraction: on the way to getting Windows support in master we're likely to break things loudly many times :D.

@RX14
Copy link
Contributor Author

RX14 commented Aug 20, 2017

Yup, if this turns out to be wrong I'm happy to do the work to put them all under the same hierarchy again. I'll rebase this and recheck my work in a bit.

@RX14 RX14 force-pushed the feature/separate-socket-io branch from 6429494 to b93251a Compare August 20, 2017 15:48
@RX14
Copy link
Contributor Author

RX14 commented Aug 20, 2017

This isn't ready to merge just yet.

I also hit this while refactoring, is it a bug?

    def add(timeout : LibC::Timeval? = nil)
      if timeout
        typeof(timeout) # => LibC::Timeval
        LibEvent2.event_add(@event, pointerof(timeout))
      else
        LibEvent2.event_add(@event, nil)
      end
    end

# argument 'timeout' of 'LibEvent2#event_add' must be Pointer(LibC::Timeval), not Pointer(Nil)

# doesn't make sense

    def add(timeout : LibC::Timeval? = nil)
      if timeout_copy = timeout
        LibEvent2.event_add(@event, pointerof(timeout_copy))
      else
        LibEvent2.event_add(@event, nil)
      end
    end

# even making a copy doesn't work

@asterite
Copy link
Member

@RX14 The problem is that timeout's type is a nilable type, an pointerof of a nilable type is not what you want. You can assign timeout to a different variable inside if timeout and then take a pointer to that, maybe that'll work.

@RX14
Copy link
Contributor Author

RX14 commented Aug 20, 2017

@asterite that's exactly what my second example does with timeout_copy.

@asterite
Copy link
Member

asterite commented Aug 20, 2017

I mean:

if timeout
  timeout_copy = timeout
end

In if timeout_copy = timeout, timeout_copy is still nilable (you are assigning exactly timeout to it, before the type is filtered)

@RX14
Copy link
Contributor Author

RX14 commented Aug 20, 2017

@asterite Well if you use if foo = @foo where @foo is nillable, foo is never nillable.

@asterite
Copy link
Member

@RX14 I know, but the compiler doesn't :-P

@asterite
Copy link
Member

@RX14 Actually, no:

if foo = @foo
end

# but here foo can be nilable

The only place where foo is not nilable is inside the if.

@RX14
Copy link
Contributor Author

RX14 commented Aug 20, 2017

Aargh but it feels wrong! I never think of if foo = <expr> as having foo scoped outside the body of the if. It's better than defining def add(timeout : Nil = nil) at least.

IO::Syscall is an abstraction over non-blocking IO provided by syscalls. It
takes care of checking for EAGAIN and managing the lists of waiting fibers.
On some platforms - notably windows - socket descriptors are different from file
descriptors so it makes no sense for them to be shared under a common hierarchy.
@RX14 RX14 force-pushed the feature/separate-socket-io branch from b93251a to 98eba71 Compare August 20, 2017 17:14
@RX14
Copy link
Contributor Author

RX14 commented Aug 20, 2017

I've refactored this a bit to store a Time::Span in IO::Syscall not a LibC::Timeval. This is much cleaner but has the downside of converting each time we call libevent. I think it's worth it though.

@sdogruyol
Copy link
Member

Let's approve this 👍

@RX14
Copy link
Contributor Author

RX14 commented Aug 29, 2017

Still no second review? :(

@akzhan
Copy link
Contributor

akzhan commented Aug 29, 2017

I'm unsure that conversions on every libevent call is OK. Any benchmarks?

add_read_event
read_syscall_helper(slice, "Error reading file") do
# `to_i32` is acceptable because `Slice#size` is a Int32
LibC.read(@fd, slice, slice.size).to_i32

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this explicit conversion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because read_syscall_helper is explicitly annotated to return i32.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but read_syscall_helper didn't exist before this refactor, so why does it need to be explicitly annotated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because IO#read should always return typeof(Bytes.new.size). That wasn't the case before.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it :). Thanks for bearing with me!

Copy link
Contributor Author

@RX14 RX14 Aug 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be fantastic if we could force return types from an abstract declaration. Returning anything other than i32 from read doesn't make sense, and a return value from write is similarly useless. But that's (literally) another issue.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we were discussing exactly that with @matiasgarciaisaia in relation to #4864).

@RX14
Copy link
Contributor Author

RX14 commented Aug 30, 2017

@akzhan Actually, every call to libevent will have been immediately preceded by a context switch (EAGAIN). Adding 8 or so cycles to that I doubt will affect performance.

@RX14 RX14 merged commit c26fbc9 into master Aug 30, 2017
@RX14 RX14 added this to the Next milestone Aug 30, 2017
@RX14
Copy link
Contributor Author

RX14 commented Aug 30, 2017

Reminder for the changelog that this is a breaking change.

@mverzilli
Copy link

Just created a breaking-change label to keep track of those in particular.

@faustinoaq
Copy link
Contributor

@mverzilli Do you think breaking-change should be kind:breaking-change ? I mean just keeping label pattern 😅

@mverzilli
Copy link

It's an ortogonal dimension in my head. Let's keep it like that for now, I don't see any harm.

@asterite asterite deleted the feature/separate-socket-io branch April 27, 2018 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants