-
-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
alt_hosts and DB API 2.0 #168
Comments
Yes, this option is supported as DB API uses Client.from_url method. This method has connect('clickhouse://{user}:{password}@host1/{database}?alt_hosts='host2:1234,host3,host4:5678') |
Thanks. 👍 |
May be it's not so obvious:
I'll add some examples. |
I have just tested it. I thought that by using This is not the case here. If I understood correctly, this type of connection will always use host1, and if it fails that alternate hosts will be used. In the case that host1 is down, execution of every successive query with this connection will be degraded by time needed for jumping from host1 to alternate (working) host. As I can see jumping lasts for around a second. One second doesn't sound a lot, but if you are executing a lot of queries, one second for each query is a very big degradation of the service. My idea is a bit different. I would like to provide list of hosts in the cluster (actually all masters and replicas nodes) and randomly pick one (kind of load balancing) as a host1. When working host has been found, I will use it for all successive queries. If it ever fails, I would like to start again procedure for finding new working host. What do you think about this logic? |
As far as I understand the problem is the In [1]: from clickhouse_driver import connect
In [2]: conn = connect('clickhouse://localhost1/default?alt_hosts=localhost')
In [3]: cur = conn.cursor()
In [4]: cur.execute('select 1')
Failed to connect to localhost1:9000
Traceback (most recent call last):
File "/tmp/versions/3.6.5/envs/marilyn/lib/python3.6/site-packages/clickhouse_driver/connection.py", line 256, in connect
return self._init_connection(host, port)
File "/tmp/versions/3.6.5/envs/marilyn/lib/python3.6/site-packages/clickhouse_driver/connection.py", line 226, in _init_connection
self.socket = self._create_socket(host, port)
File "/tmp/versions/3.6.5/envs/marilyn/lib/python3.6/site-packages/clickhouse_driver/connection.py", line 202, in _create_socket
for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM):
File "/tmp/versions/3.6.5/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
In [5]: cur.fetchall()
Out[5]: [(1,)]
In [6]: cur.execute('select 1')
In [7]: cur.fetchall()
Out[7]: [(1,)]
In [8]: cur = conn.cursor()
In [9]: cur.execute('select 1')
Failed to connect to localhost1:9000
Traceback (most recent call last):
File "/tmp/versions/3.6.5/envs/marilyn/lib/python3.6/site-packages/clickhouse_driver/connection.py", line 256, in connect
return self._init_connection(host, port)
File "/tmp/versions/3.6.5/envs/marilyn/lib/python3.6/site-packages/clickhouse_driver/connection.py", line 226, in _init_connection
self.socket = self._create_socket(host, port)
File "/tmp/versions/3.6.5/envs/marilyn/lib/python3.6/site-packages/clickhouse_driver/connection.py", line 202, in _create_socket
for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM):
File "/tmp/versions/3.6.5/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known But, additional time on the same In [1]: from clickhouse_driver import Client
In [2]: c = Client.from_url('clickhouse://localhost1/default?alt_hosts=localhost')
In [3]: c.execute('select 1')
Failed to connect to localhost1:9000
Traceback (most recent call last):
File "/tmp/versions/3.6.5/envs/marilyn/lib/python3.6/site-packages/clickhouse_driver/connection.py", line 256, in connect
return self._init_connection(host, port)
File "/tmp/versions/3.6.5/envs/marilyn/lib/python3.6/site-packages/clickhouse_driver/connection.py", line 226, in _init_connection
self.socket = self._create_socket(host, port)
File "/tmp/versions/3.6.5/envs/marilyn/lib/python3.6/site-packages/clickhouse_driver/connection.py", line 202, in _create_socket
for res in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM):
File "/tmp/versions/3.6.5/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
Out[3]: [(1,)]
In [4]: c.execute('select 1')
Out[4]: [(1,)] Or the problem is that we need to check all hosts at 'starting up' time? Btw, load balancing (round robin) is in my plans. |
From your example it is visible that each select takes tame for execution on localhost1, and than on localhost. It would be much better if connection remembers 'successful' host, and use it as long as there is no problems with it. It would be nice if each reconnecting to alternate host can be done automatically with just a warning message in the log. It is just an idea. |
Also, I have noticed that in asynch module (https://github.com/long2ice/asynch) there is a support for connection pool. Maybe the logic could be to get another connection from connection pool if one connection fails. |
Fixed in master. |
Is alt_hosts option supported with DB API 2.0?
I saw an example on https://clickhouse-driver.readthedocs.io/en/latest/features.html:
Can I have the same functionality with DB API 2.0?
The text was updated successfully, but these errors were encountered: