Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address of remote node in CCS context is not available #44976

Closed
rtkjliviero opened this issue Jul 29, 2019 · 10 comments
Closed

Address of remote node in CCS context is not available #44976

rtkjliviero opened this issue Jul 29, 2019 · 10 comments
Labels
:Distributed Coordination/Network Http and internode communication implementations

Comments

@rtkjliviero
Copy link

Feature request

Expose (at least) seed node address information in remote/info.
I see that this feature was actually removed as part of this PR. It would be very useful to have it back, even with the caveat that it is best-effort.

This is particularly relevant when nodes do not use 9200 as their HTTP port - for example, in a multi-node testing situation on a single host.

@rtkjliviero rtkjliviero changed the title Address of remote node in CCS is not available Address of remote node in CCS context is not available Jul 29, 2019
@jimczi jimczi added the :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. label Jul 30, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@jimczi jimczi added :Distributed Coordination/Network Http and internode communication implementations and removed :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. labels Jul 30, 2019
@andrershov
Copy link
Contributor

@rtkjliviero could you please elaborate on

This is particularly relevant when nodes do not use 9200 as their HTTP port - for example, in a multi-node testing situation on a single host.

usage scenario.

@javanna
Copy link
Member

javanna commented Aug 15, 2019

I think that the user is asking if we can expose the http address of the remote nodes. We have removed that as it required a remote call to each of the nodes when calling remote info, given that the http port is not part of the info kept around about each remote node, while the transport port is.

Please correct me if I am wrong.

I would also be curious to hear what the usecase is for this.

@rtkjliviero
Copy link
Author

@javanna is correct - I should have been more clear. Thanks!

My particular use case is:

  1. I have internal tooling which wraps some elasticsearch-py functionality.
  2. As part of testing that tooling, I am bringing up several Docker containers running Elasticsearch. These containers are configured used to set up a few clusters. Each is configured as a remote node to a single coordinator node, for <reasons>. For the sake of consistency, I would prefer to have the local port exposed by a given Docker and the http.port used by its corresponding Elasticsearch instance be the same.
  3. I then run a series of tests against my internal tooling, using those the coordinator node and/or the remote clusters as targets.
  4. To do so, I have to either assume every cluster uses http.port 9200 (which is fine until one doesn't "in the wild" or in production), or maintain a mapping of which cluster uses which port (which is a bit fragile).

...but in general, there is no guarantee that a remote node is using port 9200 for http - it could be configured to use any arbitrary port. I'm using a single "coordinator" node connected to several remote clusters to facilitate CCS, but I'd like to be able to perform actions on them which are not covered by CCS functionality. To do so I'd like to have a way of discovering the correct port so that I can connect to the cluster directly when necessary.

@javanna
Copy link
Member

javanna commented Aug 16, 2019

Thanks for the explanation @rtkjliviero ! Why would it be fragile to keep track of the different nodes and their http ports? I rather think that it would be a burden to have to maintain this redundant info (the http port of remote nodes), or make remote calls to be able to return it when it's not needed. It would be much cleaner to track which nodes run on which http port externally, as nodes are started.

@rtkjliviero
Copy link
Author

I rather think that it would be a burden to have to maintain this redundant info (the http port of remote nodes)

That's kind of what I meant - I'll have to maintain a separate mapping which could change at any time (say I start using another hypothetical tool that reserves one of the non-9200 ports I have already chosen, and I have to change my config). Ideally I would just maintain a single docker-compose file, and detect the correct http.port values at runtime by using the _remote/info enpdoint (or its API equivalent, of course).

In my case I can certainly track the nodes and their ports externally, since I'm defining the test environment. But in a "live" environment, I might not have that information. In that case I'd never be able to reliable connect to a node that wasn't using http.port==9200.

@javanna
Copy link
Member

javanna commented Aug 22, 2019

I see what you mean, it's a burden somehow, but it makes little sense for Elasticsearch to do this with the only goal of simplifying application's code, given that it can be done externally.

On the mapping possibly changing at anytime: the http port can only change by modifying the config file and restarting the node.

I also don't follow the point around a "live" environment: If you want to connect to your nodes and send requests to them, you need to know their http ports, no matter which cluster they belong to. And that is decided when the node has started. That is why the record of all the nodes and their http addresses should be kept outside of Elasticsearch, in my opinion.

@andrershov
Copy link
Contributor

@rtkjliviero as far as I understand, you want to get remote node ports by invoking GET /_remote/info. And this is possible because seeds in the response will be comma-separated hostname:port entries. The PR you are referring to removes DNS lookup (A record) of the seed nodes, so previously this API call was returning ip:port entries in the seeds field.

@andrershov
Copy link
Contributor

@rtkjliviero I've just realized that you're asking about HTTP ports, not transport ports. The PR you've linked is about different change.
Probably the correct PR is this one #29568 where we dropped the support for HTTP info of the remote cluster.
I agree with @javanna that this is not a change we want to make, given that it can be done externally.

@rtkjliviero
Copy link
Author

Understood, that's fair. Thanks for taking the time to discuss this, @javanna and @andrershov!
(And thanks also for linking the correct PR)

I'll go ahead and close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Network Http and internode communication implementations
Projects
None yet
Development

No branches or pull requests

5 participants