-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage / documentation #616
Comments
You can reduce system load quite a bit by reducing Kafka size Haven't checked exact resource usage before and after, but at least it's not crashing constantly anymore 🙂 |
I am having a look at the memory usage, as you need over 4GB of memory to run this without crashes after a few hours/days due to out of memory errors and the OOM Killer killing either clickhouse or Kafka. For Kafka, limiting the number of shards using For Clickhouse, I looked at the settings and found this option: https://clickhouse.tech/docs/en/operations/server-configuration-parameters/settings/#max_server_memory_usage_to_ram_ratio It allows you to specify a percentage of the max total memory available to Clickhouse. Without it, Clickhouse does not limit the memory used by queries and defaults to a max of 10G, which is far overkill for most on-premise sentry installations. I would suggest to configure The easiest way to do this seems to build a custom clickhouse container, with one additional file in <yandex>
<max_server_memory_usage_to_ram_ratio>0.3</max_server_memory_usage_to_ram_ratio>
</yandex> I made this change in my local config and will monitor Clickhouse's memory usage over the next day. If this works I will submit a PR with the change. |
Update after one day: memory usage is stable, clickhouse no longer goes over 30% of total memory on the server. @BYK can you check with your clickhouse admins to see if they recommend a specific setting here, like a minimum? I am not sure if we should set this to a fixed value, or to a percentage of total memory. I will open a PR once you tell me what you / they prefer :) |
Calling for a @JTCunning. Please report to the nearest text box. |
Hello.
I'd probably set it by default since it tracks more than just running queries, but we've never worked with this value being limited and cannot comment on what happens to a production system when the limit is reached. In general, ClickHouse is not happy with being bound to such a small amount (less than 10GB) of RAM. I wouldn't be surprised if we're back here with others asking "how do I have it return my query results and still keeping memory low?" |
Ouch. I guess most onpremise install are quite small (a few events per seconds at max), and the 2400MB memory mentioned by the doc fits within this. If an onpremise Sentry install required 12 GB or memory (10 GB for clickhouse, 2 GB for the other process), then this should be written in bold in the README imo. |
I wouldn't go so far as to say it's required, I chose "not happy" since the overwhelming majority of focus within ClickHouse development is geared towards a scalable production system where ClickHouse is deployed to isolated machines with larger resource constraints. I can't personally comment on what will happen when ClickHouse is deployed with a smaller constraint and expected to perform linearly because that's not my area of focus inside of the Sentry organization, nor is it anyone's at the moment. I'm stating that if "Things Get Weird" we'd come back to this thread and apply a more scrutinized troubleshooting procedure to the example deployment beyond "set this one setting and see what happens". |
@renchap I'd say let's try this new setting out for a while on your setup and then we can make it the default with a configuration option for larger deployments. Are you able to share your average and/or peak load for reference? |
My load is very very low, 20 events / minute maximum. I submitted a proposal in #651 |
By default, this will configure Clickhouse with a max memory of 30% of the host memory. Related to getsentry#616
@renchap: Please do let us know if you end up hitting any high watermarks for memory that prevent Sentry from functioning properly. The newer memory management techniques in ClickHouse are enticing for us but are not without their fair share of bear traps (like ClickHouse/ClickHouse#12583 being the reason we pin this repo to <20.4). There are ones of queries Sentry can issue that have the potential to use an annoyingly large amount of memory for aggregation and sorting. Asking your team members to "tone it down on the amount of distinct tag keys and values" might not be the easiest ask, but if I could peer into my crystal ball and guess what would end up yielding There are some settings that will instruct ClickHouse to return inaccurate/incomplete results (what it collected up until the limit) instead of throwing an exception. I'd personally say it won't be worth proactively applying those since it would be difficult to tell the difference between a successfully executed query and a query that returned early. |
Closes #616, supercedes #651. Adds an option to reduce max memory usage of Clickhouse server. Sets it to 30% of all available RAM as the default. Co-authored-by: Renaud Chaput <[email protected]>
Closes #616, supersedes #651 Adds an option to reduce max memory usage of Clickhouse server. Sets it to 30% of all available RAM as the default. Co-authored-by: Renaud Chaput <[email protected]>
We have a rather small setup with ~1 event / minute and usually not more than one/two user using the web interface at the same time. However we recorded a combined memory usage well above 4G with both Kafka and Clickhouse using more than 1G each.
Since the docs mention 2400MB memory as a minimum I was wondering whether this is actually realistic or if something goes wrong on our side? Are there are ways to configure the setup to reduce memory usage? Otherwise maybe the docs should be updated to show a more realistic minimum requirement (about 5g?).
Memory usage: Top graphs are kafka and clickhouse, followed by the worker and the web container.
Memory usage combined:
The text was updated successfully, but these errors were encountered: