-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching/Invalidation strategy for APIs, Consumers and Plugins #15
Comments
From an implementation perspective this could be a cleaner and easier design:
INSERT INTO invalidations (id_to_invalidate, type_to_invalidate, created_at) VALUES (263af28e-72b7-402f-c0f1-506a70c420e6, "plugins", now())
|
The first step for building invalidations has been merged with #42. It implements a mechanism for time based cache expiration. To fully implement invalidations, the time-based expiration should be removed in favor of an application-based invalidation. |
Following up on this - do you see any problems with the proposed solution (the Advantages:
Disadvantages:
|
Really against this. It looks very clumsy and using techs that are not built for this kind of job (Cassandra and querying it every second... This feels like manually doing something clustered databases are supposed to deal with, and this is not a good sign...). Sadly I think our only real option is to use something actually built for that kind of job, and step away from Cassandra. |
if you're leaning towards clustering Kong nodes, then might as well ditch the database layer and just rely on local memory and local storage of each node, and implement data syncing algorithm... otherwise, low hanging fruit is to keep with the database provided clustering and do selective caching based on entity type ( another approach that might be simpler and more robust, is a separate process that talks to the database (on a recurring intervals) on behalf of Kong, and just updates in-memory objects that Kong can access (shared memory space) |
👍. Which is why I suggest using already existing solutions like service directories (@thefosk quoted Consul). |
This would be the solution proposed above. The only problem with having Kong querying Cassandra every To effectively solve the problem we need a good implementation of a gossip protocol, so that for example when one machine receives the HTTP request to update an API, we can communicate this change to every other node. So we don't need to replace our database, we need a new feature on top of our database. If we decide to go thru this route, then something like Serfdom that can live in the machine where Kong is running would be ideal, because from a user's perspective there wouldn't be nothing else to setup, scale or maintain. |
correct, however you wont be querying with every request, you'll just be querying to update the Kong shared memory space, meaning you can make one big query to get all the data every time (in theory) and just update what's needed |
However loading the entire records from Cassandra in nginx's lua memory zone could be an issue if too many records are in Cassandra. This approach also feels like reinventing the wheel (a "implement data syncing algorithm" like @ahmadnassri said), a complex problem (we don't have solid knowledge on distribution algorithms at Mashape, or am I overthinking this?) already solved. |
Not only that, nginx's memory zone is a simple key=value store with no indexes or support for complex queries. We would need to drop any query other than |
The idea is nice, not sure how feasible it is though, as @thibaultcha said. Let's say that we have multiple Kong nodes connected through a gossip protocol, and each node has a local datastore (let's say Cassandra, since we already have the DAO for it), we could have detached Cassandra nodes on each Kong node, in a clustered Kong environment, where basically Cassandra only stores the data and the gossip protocol takes care of the propagation (effectively replacing Cassandra's clusterization and propagation). Kong would ship with in an embedded nginx + Cassandra + Serfdom configuration, no external DBs. Each node would have it's own Cassandra (or any other datastore). We would still be able to make complex queries since the underlying datastore would be Cassandra. |
I don't understand the role of Cassandra in your latest comment @thefosk. |
Also does serfdom implement a storage solution? I think we are here talking about 2 different approaches:
|
@thibaultcha Cassandra would only be the store medium, living standalone in the machine. Basically you could replace Cassandra with SQLite and the role would be the same. Serfdom and its gossip protocol implementation would tell each node the data to store locally into Cassandra, without having a centralized datastore. The reason why I said Cassandra is just for conveniency: we already have the DAO for it. |
I agree. So, we can't make too many changes all at once, I want to proceed step by step and see where this leads us to. My idea is to:
This will not change much the foundations (where the datastore is located, for example), but will achieve the invalidation goal on top of the Gossip protocol of Serfdom. Serfdom will be shipped in the distributions (like dnsmasq), so nothing will change in the installation procedure. Serfdom opens a couple of ports, that the user will need to make sure they are properly firewalled. Thoughts? |
I like the idea, couple of thoughts:
|
|
this probably should be priority, to make sure we have picked the right tool, not just based on hype. |
Can you investigate this? @ahmadnassri
Not much. A couple of ports open, and that would be pretty much it.
It's okay. I am not too worried about this - we are using a pretty basic functionality among the ones that it provides, nothing crazy.
Yes - to date Serfdom is my best option. Happy to discuss about other options. When I was looking for a solution to this problem, Serfdom was my pick because is decentralized (no centralized servers to setup, like Apache Zookeeper) and very straightforward to use. Also, support for every platform. (Consul itself is also built on top of Serfdom for cluster awareness).
Serfdom will trigger a script every time an event is received. The script can then whatever we want to do, start an HTTP request (my first idea), TCP, etc. Invalidation on our scale could happen over HTTP in my opinion, because it's not going to be too much intensive. |
best suited for a lawyer ... my go-to source for understanding licenses is: http://choosealicense.com/licenses/mpl-2.0/ seems permissive, but I would still consult an expert.
I meant in terms of memory, hard disk and CPU usage :) |
Negligible. |
@thefosk ping Glaser for legal. can apache 2.0 wrap a mpl2? |
Just did. No red flags in the licenses. |
Since we reached an agreement on Serfdom for now and there are no licensing issues, closed in favor of #651 which covers the technical integration aspects. |
@thefosk you look at this? https://github.com/airbnb/synapse |
That serves a different purpose @sinzone |
already using serf why not go all the way and make consul a prerequisite for Kong? Would allow us to |
@hutchic because Consul is a centralized dependency and we don't want to introduce any more dependencies to Kong besides the datastore. |
sure I mean consul is no more centralized then serf https://www.consul.io/intro/vs/serf.html but I can understand wanting to avoid adding more dependencies. Forever growing the number of queries to psql doesn't seem like it'll scale past a certain point so might need distributed in-memory some day regardless |
Co-authored-by: herlon <[email protected]>
* wip(cache) implemented LRU cache a number of tests marked as pending waiting for updating to the new async/stale query method * wip(client) implements async dns resolution. rewrites the core resolution code
### Summary Before this commit: ``` patching file LuaJIT-2.1-20190507/src/lj_tab.c patching file LuaJIT-2.1-20190507/src/lj_asm_arm.h patching file LuaJIT-2.1-20190507/src/lj_api.c patching file LuaJIT-2.1-20190507/src/lj_arch.h patching file LuaJIT-2.1-20190507/src/lj_cconv.c patching file LuaJIT-2.1-20190507/src/lj_obj.h patching file LuaJIT-2.1-20190507/src/lj_state.c patching file lua-resty-core-0.1.17/lib/resty/core/socket_tcp.lua patching file lua-resty-core-0.1.17/lib/resty/core/socket_tcp.lua patching file lua-resty-core-0.1.17/lib/resty/core/socket_tcp.lua patching file lua-resty-core-0.1.17/lib/resty/core/socket_tcp.lua patching file lua-resty-core-0.1.17/lib/resty/core/socket_tcp.lua patching file lua-resty-core-0.1.17/lib/resty/core.lua patching file lua-resty-core-0.1.17/lib/resty/core/socket_tcp.lua patching file lua-resty-core-0.1.17/lib/ngx/balancer.lua patching file lua-resty-websocket-0.07/lib/resty/websocket/client.lua patching file nginx-1.15.8/src/http/ngx_http_upstream.c Hunk #2 succeeded at 1691 (offset -3 lines). Hunk #4 succeeded at 1768 (offset -7 lines). patching file nginx-1.15.8/src/http/ngx_http_special_response.c patching file nginx-1.15.8/src/stream/ngx_stream_proxy_module.c Hunk #2 succeeded at 802 (offset 20 lines). patching file ngx_lua-0.10.15/src/ngx_http_lua_socket_tcp.c Hunk #5 succeeded at 1556 (offset 48 lines). Hunk #6 succeeded at 1626 (offset 48 lines). Hunk #7 succeeded at 1726 (offset 48 lines). Hunk #8 succeeded at 1749 (offset 48 lines). Hunk #9 succeeded at 1760 (offset 48 lines). Hunk #10 succeeded at 1775 (offset 48 lines). Hunk #11 succeeded at 1816 (offset 48 lines). Hunk #12 succeeded at 1827 (offset 48 lines). Hunk #13 succeeded at 1849 (offset 48 lines). Hunk #14 succeeded at 1868 (offset 48 lines). Hunk #15 succeeded at 1878 (offset 48 lines). Hunk #16 succeeded at 2008 (offset 48 lines). Hunk #17 succeeded at 6101 (offset 62 lines). patching file ngx_lua-0.10.15/src/ngx_http_lua_socket_tcp.h patching file ngx_lua-0.10.15/src/ngx_http_lua_socket_tcp.c Hunk #1 succeeded at 1647 (offset 48 lines). patching file ngx_lua-0.10.15/src/ngx_http_lua_socket_tcp.c Hunk #3 succeeded at 1579 (offset 48 lines). Hunk #4 succeeded at 1644 (offset 48 lines). Hunk #5 succeeded at 1668 (offset 48 lines). patching file ngx_lua-0.10.15/src/ngx_http_lua_socket_tcp.c Hunk #1 succeeded at 1756 (offset 48 lines). patching file ngx_lua-0.10.15/src/ngx_http_lua_balancer.c patching file ngx_lua-0.10.15/src/ngx_http_lua_common.h patching file ngx_lua-0.10.15/src/ngx_http_lua_module.c patching file ngx_lua-0.10.15/src/ngx_http_lua_string.c patching file ngx_stream_lua-0.0.7/src/ngx_stream_lua_control.c patching file ngx_stream_lua-0.0.7/src/ngx_stream_lua_variable.c patching file ngx_stream_lua-0.0.7/src/ngx_stream_lua_common.h patching file ngx_stream_lua-0.0.7/src/ngx_stream_lua_util.c patching file ngx_stream_lua-0.0.7/src/api/ngx_stream_lua_api.h ``` vs. (after this commit) ``` patching file LuaJIT-2.1-20190507/src/lj_tab.c patching file LuaJIT-2.1-20190507/src/lj_asm_arm.h patching file LuaJIT-2.1-20190507/src/lj_api.c patching file LuaJIT-2.1-20190507/src/lj_arch.h patching file LuaJIT-2.1-20190507/src/lj_cconv.c patching file LuaJIT-2.1-20190507/src/lj_obj.h patching file LuaJIT-2.1-20190507/src/lj_state.c patching file lua-resty-core-0.1.17/lib/resty/core/socket_tcp.lua patching file lua-resty-core-0.1.17/lib/resty/core/socket_tcp.lua patching file lua-resty-core-0.1.17/lib/resty/core/socket_tcp.lua patching file lua-resty-core-0.1.17/lib/resty/core/socket_tcp.lua patching file lua-resty-core-0.1.17/lib/resty/core/socket_tcp.lua patching file lua-resty-core-0.1.17/lib/resty/core.lua patching file lua-resty-core-0.1.17/lib/resty/core/socket_tcp.lua patching file lua-resty-core-0.1.17/lib/ngx/balancer.lua patching file lua-resty-websocket-0.07/lib/resty/websocket/client.lua patching file nginx-1.15.8/src/http/ngx_http_upstream.c patching file nginx-1.15.8/src/http/ngx_http_special_response.c patching file nginx-1.15.8/src/stream/ngx_stream_proxy_module.c patching file ngx_lua-0.10.15/src/ngx_http_lua_socket_tcp.c patching file ngx_lua-0.10.15/src/ngx_http_lua_socket_tcp.h patching file ngx_lua-0.10.15/src/ngx_http_lua_socket_tcp.c patching file ngx_lua-0.10.15/src/ngx_http_lua_socket_tcp.c patching file ngx_lua-0.10.15/src/ngx_http_lua_socket_tcp.c patching file ngx_lua-0.10.15/src/ngx_http_lua_balancer.c patching file ngx_lua-0.10.15/src/ngx_http_lua_common.h patching file ngx_lua-0.10.15/src/ngx_http_lua_module.c patching file ngx_lua-0.10.15/src/ngx_http_lua_string.c patching file ngx_stream_lua-0.0.7/src/ngx_stream_lua_control.c patching file ngx_stream_lua-0.0.7/src/ngx_stream_lua_variable.c patching file ngx_stream_lua-0.0.7/src/ngx_stream_lua_common.h patching file ngx_stream_lua-0.0.7/src/ngx_stream_lua_util.c patching file ngx_stream_lua-0.0.7/src/api/ngx_stream_lua_api.h ```
Kong fetches
Api
,Application
andPlugin
data from the datastore on every request. The performance of the system can be improved by implementing either a caching strategy, or an invalidation strategy.The caching strategy is easier to build, but inefficient, since data will be fetched again every
n
seconds, even the data that didn't change.The invalidation strategy is more efficient, but harder to build, since every node needs to invalidate the data, and there needs to be a mechanism that keeps the invalidations in sync on every node. There are two ways of implementing the invalidation strategy:
Api
,Application
orPlugin
is being updated/deleted, the system iterates over all the nodes in the cluster and issues aPURGE
HTTP request to invalidate the data. This means that each node needs to be aware of other nodes in the cluster, so we need to introduce the concept of cluster and cluster nodes in the LUA code. This can have performance issues on large node installations, because in a 100-node system that means 100PURGE
requests to be sent.invalidations
table. Each node will be responsible for checking periodically the table and only invalidate the data that's being inserted into the table, and storing in the same table the number of nodes that have deleted the data. When the number of nodes that have deleted the data matches the number of nodes in the cluster, then the data is supposed to have been invalidated in every node and can be removed from the table. This can lead problems when the invalidation of the data happens at the same time when new nodes are being added or removed.This issue is a work in progress and more options may be available.
The text was updated successfully, but these errors were encountered: