-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TLS support + peers protocol #59
Conversation
Thank you! Haven't done a full review yet, but looks good at first glance. Because of the dependency on the |
Thanks @jhecking. Dropping Ruby 2.2 seems reasonable I tried to align the tls parameters names with the python client Let me know which ones would be prioritized to support in a first release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TCP/TLS socket connection implementation looks good to me in principal. That said, these changes alone won't be enough to actually connect to a TLS enabled cluster.
The problem is that the Ruby client still uses the older "services" info protocol for node discovery. For TLS support, the newer "peers" protocol is required. The two protocols are similar and both serve the same purpose of allowing the client to discover all the nodes in the cluster. But only the "peers" protocol contains the necessary information the client needs to establish a secure connection, e.g. the port number on which the server accepts TLS connections.
The "peers" protocol also includes the server's "tlsname", which the client generally uses to validate the certificate presented by the server, rather than using the server's hostname/IP address.
To get an idea of what it would take to implement the "peers" protocol, you can take a look at the changes to the Java client, that were done to support TLS as well as the "peers" protocol: aerospike/aerospike-client-java@5b5b06b. The Ruby client was largely modelled after the Java client so the overall structure is quite similar.
lib/aerospike/socket/ssl.rb
Outdated
|
||
def set_cert(ctx, options) | ||
if options[:cert_file] | ||
ctx.cert = OpenSSL::X509::Certificate.new(File.open(options[:cert_file])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use SSLContext#add_certificate
instead of the cert=
and key=
accessor methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Will remove this deprecated setter in favor of add_certificate
lib/aerospike/socket/ssl.rb
Outdated
|
||
def set_cipher_suite(ctx, options) | ||
if options[:cipher_suite] | ||
# TODO(wallin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just use SSLContext#ciphers=
:
ctx.ciphers = options[:cipher_suite]
lib/aerospike/socket/ssl.rb
Outdated
|
||
def set_protocols(ctx, options) | ||
if options[:protocols] | ||
# TODO(wallin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can support a list of protocol versions. The best we can do with Ruby's OpenSSL library is to set a minimum and/or maximum protocol version using SSLContext#min_version
and SSLContext#max_version
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we should default min_version
to OpenSSL::SSL::TLS1_2_VERSION
. (By default, the server only supports TLS v1.2.)
lib/aerospike/socket/ssl.rb
Outdated
attr_reader :context | ||
|
||
def initialize(host, port, timeout, ssl_options) | ||
@host, @port, @timeout = host, port, timeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this initialization happen in the Connection class itself? Since it's the same between the TCP and SSL socket implementations and the accessor methods are defined in Connection class as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I'll refactor this
lib/aerospike/socket/ssl.rb
Outdated
def verify_certificate!(socket) | ||
return unless context.verify_mode == OpenSSL::SSL::VERIFY_PEER | ||
return if OpenSSL::SSL.verify_certificate_identity( | ||
socket.peer_cert, host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not use the server's host name here to verify the certificate presented by the server, but instead use a separate "tlsname". I'll comment more on this in the review summary.
Thanks @jhecking. Will have a look at the "peers" protocol, and make the appropriate changes. |
lib/aerospike/socket/ssl.rb
Outdated
private | ||
|
||
def create_context(options) | ||
OpenSSL::SSL::SSLContext.new().tap do |ctx| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another team member suggested, that it might be best if we just let the user pass in a pre-configured SSLContext
instance, instead of creating one ourself and exposing a subset of it's config options through our own settings. I like that idea, as it gives the user full flexibility to configure the context as needed without us having to expose every single config option through our own settings. On the other hand it does require a bit more work on the user's part for a standard setup and means we can no longer set sensible defaults, like restricting the connection to TLS v1.2+.
So, how about we allow the user to pass in a pre-configured context optionally, and create one if not. E.g. just add another :context
option to the ssl_options
hash. If not provided, we continue to create a new context as you have already implemented. But in that case it would be ok to not expose more advanced options such as setting the desired cipher suites, etc.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jhecking that's a great idea, to have both options. By allowing people to pass a custom context we'd potentially avoid a bunch of future requests asking for less common options. And as a lazy Ruby developer I like the simple option of just quickly enable SSL with sane defaults
In your experience, which options are the most commonly used by AS users? Anything besides the ones in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your experience, which options are the most commonly used by AS users? Anything besides the ones in this PR?
I think we should keep it to the minimum needed to get a working connection to a standard server deployment, with sensible defaults for everything else. I would support:
:enable
:ca_file
,:ca_path
:cert_file
:key_file
,:key_file_pass_phrase
You can remove the :cipher_suite
and :protocols
options and just default SSLContext#min_version=
to OpenSSL::SSL::TLS1_2_VERSION
.
@jhecking I've come a bit on the way on the peer protocol. Could you just provide me with a couple of examples of, more or less complex, responses from the peer command? I tried looking for test cases for examples but couldn't find any. Some examples with combinations of multiple hosts, tls and no tls, IPv6, IPv4, would be great so I can create a complete parser |
That's good to hear! Here are a couple of example responses from a 3 node cluster. For reference, this is what the server side network services config looks like:
Nodes 2 and 3 use ports 3100/3144 and 3200/3244 respectively. Here is the full response for the 4 peers-* commands:
Note that the server will omit the default port 3000 in the responses. Here is the same response with TLS disabled on the cluster:
And now with TLS re-enabled but without alternate-access-address:
Unfortunately, I don't have IPv6 setup for testing on my dev environment and I'm having a bit of a problem trying to set it up. So I can't provide an example for IPv6 at the moment. But I believe it should look the same, only with the IPv4 addresses replaced with IPv6 addresses. Only I am not sure if the IPv6 addresses are enclosed in an extra set of square brackets or not. |
Thanks @jhecking. I'll try and have something ready by next weekend. As for IPv6, it seems like you need to enclose it in brackets, judging by this comment: |
Codecov Report
@@ Coverage Diff @@
## master #59 +/- ##
==========================================
+ Coverage 93.91% 94.77% +0.86%
==========================================
Files 98 140 +42
Lines 7375 8600 +1225
==========================================
+ Hits 6926 8151 +1225
Misses 449 449
Continue to review full report at Codecov.
|
@jhecking I've modified to code to handle the peers protocol as well as refactored some of the code into smaller service modules. I'm going to add more unit tests, but the PR is ready for another review on a conceptual level I believe |
Sebastian, that's great! I'm traveling and have limited internet connectivity over the next few days. I will provide feedback by Thursday evening SGT. Thanks again for the work you are putting into this and esp. for doing some (much needed) code cleanup and refactoring!
… On 11 Mar 2018, at 8:43 AM, Sebastian Wallin ***@***.***> wrote:
@jhecking I've modified to code to handle the peers protocol as well as refactored some of the code into smaller service modules. I'm going to add more unit tests, but the PR is ready for another review on a conceptual level I believe
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Thanks @jhecking. That will hopefully give me time to finish up the last pieces. Safe travels! |
@wallin, are you still planning to do a lot of changes or does it make sense for me to start reviewing the code so far now? |
@wallin, I see there are still a few test failures and a large number of tests that seem to be skipped due to some issues with detecting whether certain features are supported by the server. Anything in particular I can help you with? |
@jhecking The major changes, peers protocol + TLS, are ready for review. Some notes
Looking forward to your feedback! |
@jhecking Will look into why they are failing, but unfortunately can't continue until tomorrow night (PST). In the mean time, if you have time, please feel free to point out any errors you may find |
@jhecking quick update. I believe I fixed the issue related to feature detection. Now remaining is the batch and udf failures |
Cool, thanks! I will take a look at that and also test the TLS connection and cluster-change scenarios.
Cheers,
Jan
… On 15 Mar 2018, at 12:05 PM, Sebastian Wallin ***@***.***> wrote:
@jhecking quick update. I believe I fixed the issue related to feature detection. Now remaining is the batch and udf failures
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phew, this took longer than expected to review and I haven't even looked at the specs (much) yet. :-)
Overall, I am very happy with the changes you implemented! Thanks again for taking the time to do some refactoring and code clean-up along the way!
One thing I am not totally convinced of yet, is what value there is in pulling out a bunch of business logic into small (tiny in some cases!) service modules. Especially if most of what the service does is to manipulate the state of some other object.
The Node::Verify::PartitionGeneration
service is a good example. Other than "parsing" the partition-generation info value, the service mainly reads and updates a Node
's partion_generation
and partition_changed
values. This couples the service very tightly to the Node
implementation. E.g. the service needs to be aware that these values are Atomic
, which could change at some point.
If the main goal is to move business logic out of the Node
(and Cluster
, etc.) classes, so that these classes become smaller, maybe a better approach would be to create separate classes for concepts like "partition generation" instead. A hypothetical PartitonGeneration
class could encapsulate the actual partition generation value as well as the concept of whether the partition generation has changed (currently represented in Node#partition_changed
). It could have two methods update
, to set the new value and the changed flag as needed, and confirm
(better name?), to reset the changed flag.
This might make testing easier in some cases as well. Some of your service module specs rely a lot on test doubles and stubbing, which can lead to brittle tests as the code base evolves.
But since you are the one doing most of the work, I'll leave it to you to consider this alternative approach.
I haven't had much time to do actual testing of the changes today. Will continue working on that tomorrow.
lib/aerospike/cluster.rb
Outdated
end | ||
|
||
def tls_enabled? | ||
(ssl_options || {}).key?(:enable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would return true
if ssl_options
includes a key :enable
with a false value. But I would expect tls_enabled?
to return false
if I set ssl_options = { enable: false }
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
lib/aerospike/cluster.rb
Outdated
def tend | ||
nodes = self.nodes | ||
cluster_config_changed = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for clarity we should still initialize cluster_config_changed
to false
in case we don't go into any of the branches where it is being set to true
.
lib/aerospike/cluster.rb
Outdated
node.reference_count.value = 0 | ||
node.responded.value = false | ||
node.reset_reference_count! | ||
node.partition_changed.value = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add another Node#reset_partition_changed!
method? Or combine it with Node#reset_reference_count!
- though I'm struggling to come up with a good name for such a combined method...
lib/aerospike/cluster.rb
Outdated
end | ||
end | ||
# refresh all known nodes | ||
nodes.each { |node| Node::Refresh::Info.(node, peers) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a stylistic note: I would like to stick to the convention to use { ... }
if the primary purpose of a block is to return a value and to use do ... end
if the primary purpose of a block is its side-effects. I know it's just a preference and that the existing code does not always follow this convention, but I would like for new code to follow this "Weirich convention".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not aware of this "Weirich convention", but I definitely makes sense. Thanks for the link. I've mostly tried to apply the "line count" convention.
lib/aerospike/connection/create.rb
Outdated
module Create | ||
class << self | ||
def call(host, port, timeout: 30, tls_name: nil, ssl_options: {}) | ||
if !ssl_options.nil? && ssl_options[:enable] == true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should default ssl_options[:enable]
to true
and only fall back to a non-encrypted TCP connection if it is set to false
explicitly. The primary use-case for ssl_options[:enable]
is if someone has configured a TLS connection but wants to temporarily disable TLS, e.g. for debugging, without changing any of the TLS setting itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the "default on" TLS option, but could you clarify the use case with an example. I don't see the issue of just flipping the enable
option while still keeping the rest of the configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I just didn't express myself clearly. Having TLS to "default on" and providing an easy switch to temp. disable TLS is exactly what I meant. I.e. it should be possible to set up TLS without explicitly setting enable
to true
:
ssl_options = {
cert: ...,
key: ...,
ca_file: ...
}
policy = new Aerospike::ClientPolicy(ssl_options: ssl_options)
client = new Aerospike::Client(policy: policy)
Then, if I need to temp. disable TLS, I can just set enable
to false
without affecting the rest of the code:
ssl_options[:enable] = false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, I like that. So if I understand correctly TLS should be enabled as long as ssl_options
is not nil, and then explicitly disabled with enable: false
option?
return unless should_refresh?(node, peers) | ||
|
||
conn = node.tend_connection | ||
node.cluster.update_partitions(conn, node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are already pulling out most of the partitions refresh logic into it's own service module, then I think we can move some more logic from Cluster#update_partitions
here. Maybe the partition tokenizer (old|new) should be initialized here and then passed into Cluster#update_partitions
instead.
lib/aerospike/node/refresh/peers.rb
Outdated
module Peers | ||
class << self | ||
def call(node, peers) | ||
return if node.failures.value > 0 || !node.active? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should move this to a should_refresh?
method and use node.failed?
, analog to Refresh::Partitions
.
lib/aerospike/node/refresh/peers.rb
Outdated
::Aerospike.logger.warn("Peer node #{peer.node_name} is different than actual node #{nv.name} for host #{host}"); | ||
# Must look for new node name in the unlikely event that node names do not agree. | ||
# Node already exists. Do not even try to connect to hosts. | ||
break if Cluster::FindNode.(node.cluster, peers, nv.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to set node_validated
to true
if we find the node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed this. Thanks
|
||
generation = gen_string.to_i | ||
|
||
if node.partition_generation.value != generation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this whole block should be moved into a new Node#update_partition_generation(generation)
method? Or create a separate PartitionGeneration
class altogether - more on this later in the review summary...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I like this
lib/aerospike/node_validator.rb
Outdated
Resolv.getaddresses(host.name) | ||
end | ||
|
||
@aliases = [].tap do |aliases| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are already rewriting this for stylistic reasons, why not go all the way? :-)
@aliases = addresses.map { |addr| Host.new(addr, host.port, host.tls_name) }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course! :)
This looks really good. Thanks for your great work. In addition to @jhecking 's concerns, there are a few more issues to consider:
|
Hi @khaf, thanks for your feedback! I agree that the first 2 points you raise are important. But I do not think there is a particular issue here for the set of changes done by @wallin.
I am less concerned about the 3 point regarding code organization. If we truly consider this client community supported, then adhering to the particular code organization of another Aerospike language SDK should be of less importance than making it easy for other Ruby developers to contribute to the client. |
Thanks @jhecking! I know it's a (too) big PR with a lot of changes. I'll address your feedback in the upcoming days
The idea with the service modules it to extract non-trivial operations on models (such as eg. a Node). Essentially moving towards separation of concerns and SRP which also makes operations easier to test. Personally, I've found this a good way of creating maintainable code base that is also easy to navigate. But I think we're on the same page on this philosophy generally speaking. That said. I will admit that I'm not fully content with how it's currently organised. Being new to AS in general, I was basically figuring out how everything works while implementing this, so I don't feel that I don't have a full holistic understanding of all the concepts. The service modules are implemented very much like the corresponding Java method, which is not always the right abstraction like you say. I'm not a huge fan of how the Java implementation is organised either, but I didn't want to deviate too much from it in this first implementation. Eg. I think your idea with the My focus with this PR was first and foremost to get the functionality in place while also setting a direction for organising the code, without changing to much at the same time. You can expect more PRs from me after this one, however I promise they will be smaller and more isolated :)
I don't like this either. In upcoming PRs I want to improve the ergonomics of this by creating a mixin for declaring atomic attributes. eg.
I see your point but IMHO the purpose of unit tests is to test functionality and behavior in isolation, which is why I think as much as possibly should be stubbed and mocked. The idea is that tests should break as the code base evolves. As the abstractions become more clear, I imaging the tests will also improve. |
I agree with @jhecking on 3. I appreciate the importance of providing consistent SDKs, but mostly from an API perspective. Eg. IMHO the Java internal implementation is not exemplary, so I don't think that structure is necessarily worth preserving for the sake of consistency |
Yes! We agree on the goals. I'm just not sure whether the service modules -- in the current form anyway -- are necessarily the best way to achieve those goals. In many cases, the current service modules are not much more than a single method of the Node or Cluster module moved into a separate file. They are often closely coupled to the model since they directly manipulate its internal state. You don't really achieve separation of concerns this way because the model and the service(s) share responsibility for a single concern. E.g. the Node model and the Node::Verify::PartitionGeneration service are both responsible for keeping track of the partition generation and whether it has changed: Node keeps the state but the PartitionGeneration service updates it. It would be better if both the state as well as the methods that manipulate it would be extracted to a separate domain model altogether. Node and this new domain model should communicate only via passing messages but should be unaware of each other's internal state. But for now, let's focus on completing the implementation of the peers protocol and TLS support. I'm looking forward to further PRs from you to address these design issues. :-) |
That's not what I would focus on. I actually doubt that all of these values need to be Atomic in the first place. Most of them are only accessed from the single tend thread. Likewise, I think we can probably do away with the mutex lock used by Cluster. Its main purpose is to prevent get/put/... commands from accessing the partition table while it's being updated by the tend thread. But there are better strategies to do that using copy-on-write. You'll note that the Java client hardly uses any atomics nor locks for performance reasons. |
I believe you mean the benchmark in the |
Gemfile
Outdated
gem 'msgpack-jruby', :require => 'msgpack', :platforms => :jruby | ||
gem 'msgpack', '~> 1.0', :platforms => [:mri, :rbx] | ||
gem 'bcrypt' | ||
gem 'openssl', platforms: :mri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok to remove Ruby 2.2 from the CI support matrix, but I don't want to break compatibility with older Ruby releases unnecessarily. So can we change this to:
install_if -> { Gem::Version.new(RUBY_VERSION) >= Gem::Version.new('2.3.0') } do
gem 'openssl', platforms: :mri
end
This will allow the aerospike gem to still be installed on Ruby v2.2 and older. Even TLS encryption might still work since the standard library includes a version of the OpenSSL library anyway. That version probably does not support the latest TLS versions, new ciphers, etc. So upgrading to a later Ruby version is still highly recommended but at least the client is still usable on older Ruby versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a similar note. What's your stance on supporting JRuby? I think it can be done without too much effort, but we can address that in a separate PR if you think it's a priority
lib/aerospike/socket/ssl.rb
Outdated
tcp_sock = TCP.connect(host, port, timeout) | ||
|
||
ctx = OpenSSL::SSL::SSLContext.new | ||
ctx.set_params(ssl_options) if ssl_options && !ssl_options.empty? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail if ssl_options
includes any keys for which there are no corresponding assignment methods on SSLContext
. E.g. if ssl_options
includes a key :enable
, this will raise an error because set_params
will attempt to call ctx.enable=
.
I think we should revert back to what you had before and support only a few keys in ssl_options
that we use to call ctx.add_certificate
. For anything more complicated than that we allow the user to pass in a fully configured SSLContext instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option could be to only let ssl_options
only represent options that can be passed to initialize the SSLContext
skip the enable
flag all together. If you temporarily want to disable SSL, you can just pass nil
instead of the options. So:
- if
ssl_options
is a Hash, we create a new context and callset_options
- if
ssl_options
already is anSSLContext
we just use that.
In a sense I think this is cleaner rather than maintaining a set of own options, because we can just refer to the official documentation for SSLContext. What do you think?
lib/aerospike/socket/ssl.rb
Outdated
ssl_sock = new(tcp_sock, ctx) | ||
ssl_sock.hostname = tls_name | ||
ssl_sock.connect | ||
ssl_sock.post_connection_check(host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to check using tls_name
instead of host
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed. good catch
lib/aerospike/node/refresh/peers.rb
Outdated
|
||
peer.hosts.each do |host| | ||
begin | ||
nv = NodeValidator.new(node.cluster, node.host, node.cluster.connection_timeout, node.cluster.ssl_options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are missing a cluster_name
param here.
My point exactly :) I just wanted to know if the current/old implementation was acceptable
Point taken. I'll leave it as is, and we can extend and improve it in future PRs
We can leave this as is for now. The reason I ask is that I see many other DB drivers offering configuration for this
Got it! Will fix this |
I'm ok either way: We can leave this for a separate PR or update to the new algorithm right away. The new implementation is definitely better!
Noted. I'll check with our other client devs what their views are on enabling TCP keep-alive. |
Re. TCP keep-alive: I forgot that the server closes idle connections after 60 seconds, so TCP keep-alive is not so important. Most other Aerospike clients also close idle connections, e.g. the Java client has a |
@jhecking sounds good. I'll leave everything as is so we can get this PR merged, and then go ahead and create issues for all of the updates/improvements we've talked about I'll just spend some time debugging why the "friend" update isn't working properly, and hopefully, come up with a fix soon |
@jhecking update on the issue I'm seeing with RefreshFriends. In my local setup, every node reports two IPs (eg. 192.68.50.4 and 10.0.2.15). When iterating through these the NodeValidator connects to the first IP non-blocking and shortly thereafter to the second IP, causing Any ideas on how to best handle this case? |
I have pushed a couple more fixes to the
The last issue, is what you were running into, I think. Based on your IP addresses, it sounds like you are running the server in one or more Vagrant boxes, correct? I have the same setup. From my (macOS) host I am not able to connect to the 10.0.2.15 address that the server in the vagrant guest publishes. That would manifest itself in the After some struggles to get a second, valid IP address added to my vagrant guest, I'm now able to verify that the cluster lookup using the old services protocol (aka "friends") works as expected, even if the server publishes two separate IP addresses per node. In a production setup, you would probably want to avoid publishing two separate IP addresses for clients, and specify the address to publish for clients using the |
peers.peers.each do |peer| | ||
next if ::Aerospike::Cluster::FindNode.(cluster, peers, peer.node_name) | ||
|
||
node_validated = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be peers_validated
? (And on lines 34 and 41 as well.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turned out I missed this part of transcribing the Java peer refresh code
https://github.com/aerospike/aerospike-client-java/blob/7352c15c04015f4bd4f6985e138fa931179fb5d7/client/src/com/aerospike/client/cluster/Node.java#L363-L365
@@ -0,0 +1,51 @@ | |||
# frozen_string_literal: true | |||
|
|||
RSpec.describe Aerospike::Peers::Parse do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please move the specs under spec/lib/aerospike
to just spec/aerospike
instead?
I think we are getting close to the point where this PR is ready to be merged! Some outstanding items:
Anything else I'm missing? |
Great! Thanks for adding the license texts and making the "friend fixes"
|
Hmm, these four files still have very low test coverage: node/refresh/friends.rb, node/refresh/peers.rb, socket/ssl.rb, and socket/base.rb. I'm a bit worried especially about Node::Refresh::Friends and Socket::SSL because these two features do not get used in the tests at all; at least Node::Refresh::Peers and Socket::Base get covered somewhat in the course of the execution of the rest of the test suite. But I'm not going to treat this as a blocker. Other than these four files, the test coverage for your changes is very good! Thanks for that! I'd just ask that you remove the specs that are currently empty except for a
Sure, I can help with that. However, were you still going to add wrappers for the Node::Refresh::* service methods to Node? That's the last pending change that I am aware of at the moment. |
I added a few specs for Socket::SSL class here: f086c45. Not great, but better than nothing. ;-) Thinking about how we can setup the CI env. so that we can actually test TLS connections end-to-end. |
Done!
Thanks for taking that off my chest! however, I took the liberty to re-write them in a more Rspec way. Sorry, being very anal about this ;)
I added some specs for
I share your concern. At some point, I'd very much like to refactor that module into smaller methods so we can do meaningful unit tests, but for now, it's best to just keep it as similar to the Java implementation as possible. Would there be any way of testing this on an integration level? Like, disable |
end | ||
end | ||
|
||
describe '::create_context' do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there two separate describe blocks for ::create_context
?
spec/aerospike/socket/ssl_spec.rb
Outdated
let(:pkey) { resource('ssl', 'test.key.pem') } | ||
let(:ssl_options) { { cert_file: cert, pkey_file: pkey } } | ||
|
||
before { allow_any_instance_of(OpenSSL::SSL::SSLContext).to receive(:add_certificate) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change before { ... }
to before do ... end
? 'Cause I'm anal about this... ;-)
Thanks!
No problem. We all got our pet peeves. ;-)
Yay!
I don't think we can disable peers support on the server. But we could make it a client side setting - for testing only. Just pull out this one line in
|
Or maybe better yet, stub that method to always return |
Great idea. I did that and it seems to work, so at least now the code path is exercised Let me know how you think we should proceed with merging |
Yay! Finally codecov is happy as well. :-)
I've taken all your changes, rebased them on master and then grouped them into just 3 commits for cleanup, peers protocol support and TLS support: https://github.com/aerospike/aerospike-client-ruby/compare/tls-support. You can run a diff against your own branch and the changes should be minimal. (I decided to remove support for Ruby 2.2.) Let me know what you think. If it looks good to you, then you can hard reset your branch to the HEAD of my own branch and force push back to GitHub. And unless you think there is anything else left to do, I would then merge your PR back to master. |
* Move cluster/node/node_validator * Add helper methods to Node to access reference_count, responded, active state * Fix URL in license boilerplace * Remove space at end of line * NodeValidator: Improve resolution of aliases * Specs: Always require spec_helper * Codecov: Ignore specs * Improve debug logs * Misc. minor stylistic changes
* Requires Aerospike Server v3.10 or later. Replaces use of 'services' info command used on older servers for cluster discovery. * Refactor tend logic: Remove from Cluster/Node models and break up into several smaller service modules. * Add specs covering both old and new protocol.
* Rename and split up Cluster::Connection into Socket::TCP and Socket::SSL * Add new ssl-options to ClientPolicy to setup TLS connection * Refactor create-connection logic into separate service module * Extend peers protocol support to use peers-tls-std when using TLS * Refactor parsing logic for hosts lists to support optional tls-name * Add new InvalidCredentials error class * Remove Ruby 2.2 from CI test matrix * Add external OpenSSL gem dependency
Great work. Just rebased and everything looks nice and tidy. I'm not missing anything afaik. Thanks for being so responsive and thorough in your comments, it's been a pleasure implementing this together with you. |
Merged!
Likewise. Looking forward to your future contributions! |
Tagged v2.6.0 and released it to RubyGems.org. |
Implement support for TLS + use new
peers
protocol