Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose acquire function to user for custom connection routing #772

Closed
yusufozturk opened this issue Sep 30, 2022 · 16 comments · Fixed by #855
Closed

Expose acquire function to user for custom connection routing #772

yusufozturk opened this issue Sep 30, 2022 · 16 comments · Fixed by #855
Assignees
Milestone

Comments

@yusufozturk
Copy link

As I see in the current V2 code, every time before there is an operation, this library runs "ch.acquire" to get an active clickhouse-server. I believe we can gain some performance in latency if we move this process to outside of the queries.

So basically a dedicated goroutine, first adds all servers into a pool, and then monitors the every endpoints and if there is an issue with a server, so it can remove from the pool.

In this case, ch.acquire only gets one of the active servers from the pool and executes the operation to that machine. We are doing a similar operation in our load balanced API gateway and it handles thousands of requests per seconds.

@gingerwizard
Copy link
Collaborator

Are you proposing changing how the pool works or allowing a specific connection to be used? Selecting a connection from the idle pool should be cheap - you can always increase the size of this in heavy workloads. Connections are only created if the idle pool is empty.

How would you prefer? fixed pool size?

@yusufozturk
Copy link
Author

I'm proposing changing how the pool works actually. But it's good idea to give people to use a specific group of Clickhouse machines by implementing some kind of workgroup or priority based model maybe?

Since Clickhouse Cloud is serverless solution at the moment, maybe it's not a big thing for CH customers. But for on-prem users, they might want to dedicate some of the Clickhouse servers just for read purposes. Our another scenario might be having 2 different regions of group of Clickhouse servers. And trying first region of servers and if there is no available server in that region, redirecting the request to other region.

I'm just brainstorming. Maybe these are too much work for this driver. But load balancing might work as an external plugin (like ch-go) and this plugin would only supply server for the request. Users can implement their own load balancing or contribute to other one directly. Still just brain storming.. :)

Thanks.

@gingerwizard
Copy link
Collaborator

There was a request to expose how connections are selected from the pool. Maybe this would solve your requirement? i.e. allow the acquire function to be custom?

@yuzhichang
Copy link

yuzhichang commented Nov 1, 2022

@gingerwizard I want allow the acquire function to be custom.
My project clickhouse_sinker make a connection pool which is shard-aware. If writing fail with retryable error(such as ZooKeeper session expire), the user retry another replica inside the same shard. And there's per-replica connections limit.
Retrying inside the same shard helps to leverage ReplicatedMergeTree deduplication feature.

@gingerwizard gingerwizard changed the title Better load balancing for server connections Expose acquire function to user for custom connection routing Nov 1, 2022
@gingerwizard
Copy link
Collaborator

Agree we need to do this @yuzhichang

@gingerwizard
Copy link
Collaborator

We want to do this. However, currently, acquire returns

func (ch *clickhouse) acquire(ctx context.Context) (conn *connect, err error) {

a connect struct has access to the internals of the client and manages the pools. I'm not keen on exposing these.

We could possibly expose the dial function. This currently does the actual connection i.e.

func (ch *clickhouse) dial(ctx context.Context) (conn *connect, err error) {
	connID := int(atomic.AddInt64(&ch.connID, 1))
	for i := range ch.opt.Addr {
		var num int
		switch ch.opt.ConnOpenStrategy {
		case ConnOpenInOrder:
			num = i
		case ConnOpenRoundRobin:
			num = (int(connID) + i) % len(ch.opt.Addr)
		}
		if conn, err = dial(ctx, ch.opt.Addr[num], connID, ch.opt); err == nil {
			return conn, nil
		}
	}
	if err == nil {
		err = ErrAcquireConnNoAddress
	}
	return nil, err
}

We could allow the user to override this. I would change the signature so it received a function connect(string, id, options) which the user would have to ensure they call - as this would return a connect and handle the low level logic of connecting. The user would be able to control the address used, however.

The client would continue to use the connection pool unless dial was specified - in which case the custom function would always be called.

@yusufozturk @yuzhichang does this solve your requirements? Feedback please.

@yuzhichang
Copy link

yuzhichang commented Nov 20, 2022

@gingerwizard It looks complicated for me to switch connection shard-aware based on your callback-flavor proposal. I'd prefer code both shard info and shard-aware retry strategy into dsn or options.

@gingerwizard
Copy link
Collaborator

I’m trying to avoid coding in strategies which are app specific - and avoid the need to constantly add new approaches.
We could possibly just allow a function where the user returns an addresss And options. You’d just need to implement. No call back just return two parameters.

@gingerwizard
Copy link
Collaborator

The challenge I see with this is whilst you’re controlling the server you’re connecting to, there would be no pool. Connections wouldn’t be reused. Maybe we could require the user return a lower level type representing an actual connection

@gingerwizard
Copy link
Collaborator

@yuzhichang we actually have this already https://github.com/ClickHouse/clickhouse-go/blob/main/examples/clickhouse_api/connect_settings.go#L40-L46 via DialContext - this allows you to control how the low-level connection is created.

The current proposal is to add a function that allowed the user to return the next address to use (specified in options e.g. dial - this would effectively control the next address used when a connection was needed - overriding the current dial function

However, this is incomplete. The missing piece is the above only lets you control the dial side when a new connection is needed. i.e. client would still manage the idle pool and determine when a new connection was opened. This is tricky to change since the connect object has buffers and readers associated with it that are created https://github.com/ClickHouse/clickhouse-go/blob/main/conn.go#L103-L120 - we can't really expose this as its fundamental to how the client works and is tightly coupled to other behaviour. Although the net.Conn can be controlled through DialContext this is again only called when a new connection is established - reuse of internal connections of the client uses the idle pool etc.

@yusufozturk
Copy link
Author

@gingerwizard I think being able to use a custom dial would be perfectly okay in my situation. Thanks!

@gingerwizard gingerwizard added this to the 2.5.0 milestone Dec 2, 2022
@jkaflik
Copy link
Contributor

jkaflik commented Dec 16, 2022

As mentioned by @gingerwizard we will not add any use case-specific requirements for now.
We can consider having a more sophisticated way to expose connection management in v3.

@yusufozturk
Copy link
Author

@jkaflik what do you mean by use case specific? During every query execution, there is a load balancing mechanism which is unnecessary cost and locking. This issue was opened to ask an ability to improve the process by allowing users to override load balancing functionality, not a use case request.

Is this not going to be released in v2.5? Suggested implementation was very good to allow these scenarios and not too much change in the code.

Thanks.

@jkaflik
Copy link
Contributor

jkaflik commented Dec 23, 2022

I've made a proposal in #855 for this change, but it has a major disadvantage: it requires exposing the internal Connect type, only so it can be returned within the custom function.

I'd say, it's fine as a temporary solution until we have a better architecture in v3. Then, definitely, this will become more modular. (I like the approach of ch-go and its chpool thin layer)

@jkaflik
Copy link
Contributor

jkaflik commented Dec 23, 2022

I came up with a bit better proposal. Still hacky. Details in a PR description: #855

@jkaflik
Copy link
Contributor

jkaflik commented Jan 10, 2023

@yusufozturk FYI #855 has been merged. It's a hacky solution, but hope it makes sense for your use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants