Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kong node is not registered in kong.nodes Cassandra table #1004

Closed
afajem opened this issue Feb 18, 2016 · 14 comments
Closed

Kong node is not registered in kong.nodes Cassandra table #1004

afajem opened this issue Feb 18, 2016 · 14 comments

Comments

@afajem
Copy link

afajem commented Feb 18, 2016

I have successfully created a two-node Cassandra cluster and data is being replicated between the nodes of the cluster. I have also created a Kong cluster containing two nodes. When I start each Kong node, it does indeed communicate with Cassandra and writes to the database and I have verified that the data written to Cassandra is replicated between the Cassandra nodes. Unfortunately, the Kong node do not communicate with each other; in essence they can't auto-join the cluster. Below is the console output of the startup:

[INFO] Kong 0.6.1
[INFO] Using configuration: /etc/kong/kong.yml
[INFO] database...........cassandra keyspace=kong ssl=verify=false enabled=false replication_factor=1 timeout=5000 replication_strategy=NetworkTopologyStrategy contact_points=0.0.0.0:9042,172.16.32.162:9042 data_centers=DC1=2
[INFO] dnsmasq............address=127.0.0.1:8053 dnsmasq=true port=8053
[INFO] nginx .............admin_api_listen=0.0.0.0:8001 proxy_listen=0.0.0.0:80 proxy_listen_ssl=0.0.0.0:443
[INFO] serf ..............-profile=wan -rpc-addr=127.0.0.1:7373 -event-handler=member-join,member-leave,member-failed,member-update,member-reap,user:kong=/usr/local/kong/serf_event.sh -bind=172.16.32.163:7946 -node=mycompany.net_172.16.32.163:7946 -log-level=err
[INFO] Trying to auto-join Kong nodes, please wait..
[WARN] Cannot auto-join the cluster because no nodes were found
[OK] Started

The ports on which the nodes uses to communicate between each other have been opened and I have indeed verified that when both nodes are running I can access port 7946 from the neighboring node, as a Kong node is bound to this port.

What I found most peculiar is that even when a single node is running, Kong didn't write to the kong.nodes table. I would assume that if this table is not written to the kong.nodes table, Serf doesn't know how to determine which nodes are available. Is there a reason why this is not working as expected or is my assumption of how this should work invalid?

@subnetmarco
Copy link
Member

If Kong is unable to write the the nodes table, it this should be reflected in an error in error.log. Can you please check and let me know if you see anything suspicious?

@afajem
Copy link
Author

afajem commented Feb 19, 2016

Other than the following type of entry generated each time I start a Kong node, there isn't anything of note in the error.log file.

2016/02/18 21:47:28 [notice] 30035#0: signal process started

This error is repeated in the logs with the number before # and the timestamp being the only variable parts.

@subnetmarco
Copy link
Member

A few questions:

  • What version of Cassandra are you using?
  • Port 7946 should be open for both TCP and UDP traffic.

To manually invoke the cluster join, ssh into one Kong node and run the following command:

$ serf join 1.1.1.1:7946

where 1.1.1.1 is the IP address of the other node. What output do you see?

@afajem
Copy link
Author

afajem commented Feb 19, 2016

I'm running Cassandra 2.2.5. Port 7946 is open for both protocol types.

I just manually invoked the cluster join using the suggested command above and it produced the following output:

[afajem@kong1 ~]$ serf join 172.16.32.163:7946
Successfully joined cluster by contacting 1 nodes.

After running the command, I then went ahead and checked the members of the cluster, using the cluster command on the Kong CLI and got this output:

[afajem@kong1 ~]$ kong cluster members
[INFO] Using configuration: /etc/kong/kong.yml
kong1_172.16.32.162:7946   172.16.32.162:7946  alive
kong2_172.16.32.163:7946  172.16.32.163:7946  alive

One thing to note is that I'm starting Kong using sudo. Running the same cluster command above without sudo yields an error stating:

[INFO] Using configuration: /etc/kong/kong.yml
[ERR] serf is not running

No idea if this might have something to do with it.

@subnetmarco
Copy link
Member

@afajem, could you try to replicate this bug with 0.7.0 RC1? There was a bug in the previous 0.6.x versions, where Kong was instructing Serf to use the /tmp folder for a log file, which not always had permissions to do. This may be the cause of the failure.

Version 0.7.x fixes this bug, and I am curious to see if it also fixes your problem.

@afajem
Copy link
Author

afajem commented Feb 19, 2016

OK I'll give 0.7.0 RC1 a shot and check back with you shortly. Asking the obvious here, but I'm assuming Cassandra 2.2.5 should work as the install instructions state 2.2.x.

Also any idea when 0.7.0 will be released? Just a very rough estimate would do, as we plan to go live at the end of next month.

Thanks so much for your assistance thus far.

@subnetmarco
Copy link
Member

In one or two weeks. Looking forward to seeing if 0.7.0 fixes it.

@afajem
Copy link
Author

afajem commented Feb 19, 2016

Thanks for the release estimates.

I was able to successfully upgrade to 0.7.0 RC1 without any issues. Unfortunately the nodes are still not talking to one another. The Cassandra kong.nodes table is still not getting populated. The logs are now being written to nginx home (/usr/local/kong)

I double checked my port rules through iptables and port 7946 appears to be open:

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     udp  --  any    any     anywhere             anywhere             udp dpt:7946
   57  3420 ACCEPT     tcp  --  any    any     anywhere             anywhere             tcp dpt:7946 flags:FIN,SYN,RST,ACK/SYN

I also updated the cluster_listen ip and port to explicitly point to the node's IP address and that didn't make a difference.

@subnetmarco
Copy link
Member

I think I may have an idea of what's happening here - what OS are you running? I want to give you a patched version and see if that fixes it.

@afajem
Copy link
Author

afajem commented Feb 19, 2016

RHEL 7.2

Thanks!

@subnetmarco
Copy link
Member

I will push 0.7.0 RC2 in a couple of hours.

@afajem
Copy link
Author

afajem commented Feb 19, 2016

Perfect. I'll check it out later tonight.

Thanks so much again for the prompt response. Cheers.

@subnetmarco
Copy link
Member

@afajem
Copy link
Author

afajem commented Feb 21, 2016

🎉 🎊 Success! I upgraded to 0.7.0 RC2 and it worked the first time.
Kong startup logs:

[INFO] Kong 0.7.0rc2
[INFO] Using configuration: /etc/kong/kong.yml
[INFO] Setting working directory to /usr/local/kong
[INFO] database...........cassandra keyspace=kong ssl=verify=false enabled=false replication_factor=1 timeout=5000 replication_strategy=NetworkTopologyStrategy contact_points=172.16.32.162:9042,172.16.32.163:9042 data_centers=DC1=2
[INFO] dnsmasq............address=127.0.0.1:8053 dnsmasq=true port=8053
[INFO] serf ..............-profile=wan -rpc-addr=127.0.0.1:7373 -event-handler=member-join,member-leave,member-failed,member-update,member-reap,user:kong=/usr/local/kong/serf_event.sh -bind=172.16.32.163:7946 -node=kong2_172.16.32.163:7946 -log-level=err
[INFO] Trying to auto-join Kong nodes, please wait..
[INFO] Successfully auto-joined 172.16.32.162:7946
[INFO] nginx .............admin_api_listen=0.0.0.0:8001 proxy_listen=0.0.0.0:80 proxy_listen_ssl=0.0.0.0:443
[OK] Started

Cassandra kong.nodes table:

cqlsh> select * from  kong.nodes ;

 name                     | cluster_listening_address | created_at
--------------------------+---------------------------+--------------------------
 kong1_172.16.32.162:7946 |        172.16.32.162:7946 | 2016-02-21 01:41:02+0000
 kong2_172.16.32.163:7946 |        172.16.32.163:7946 | 2016-02-21 01:42:52+0000

Thanks a million @thefosk for getting this issue resolved so quickly. It's really appreciated.

Cheers.

@afajem afajem closed this as completed Feb 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants