-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multitenant: timeseries double counts some metrics #108929
Comments
Thanks for reporting, let me see what I can find. Will report back. |
This is reproducible using a multitenant Procedure:
I believe the problem here exists with the query side. The data in tsdb looks correct to me. Let's focus on a metric that this occurs with (note: SQL level metrics appear to not be affected). We will choose The MetricsRecorder records node-level metrics (including our CPU metric) to TSDB segmented by tenant. For example:
We can see the metric exists in CRDB twice, one with a source field of Both metrics alone are recorded correctly. However, when you select "All" or "System" from the dropdown on the metrics page, the tsdb query appears to be summing the values together across tenants, which leads to our reported values being 2x what they actually are. We added the behavior for multitenant timeseries queries in #98077, which was made available in the UI with #92694. There was likely a misunderstanding in the UI code as to how the query API works, or perhaps the query code itself is not intuitive. We should find the right solution here - does it exist in the query layer? Or in how the UI uses the query endpoint? We will get this scheduled with the team. |
The ui queries for metrics based on the selection of that dropdown; selecting "All" sets |
So it seems like the current server side query behaviour is what I said previously; if it's system tenant or no tenant id provided we return the aggregated ts data across all tenants which I suppose is because in the initial implementation the ts data being viewed didn't need to be as granular but now that it does need to be more granular we should update the code to reflect this. Should have a PR up soon to fix this. |
Previously, ts queries would consider providing no tenant id and the system tenant id as the same and would return all the aggregated datapoints. This was likely due to the original implementation considering that the system tenant would always want to view all the aggregated data. This is not the case anymore since the system tenant has the ability to view all the data, system tenant specific data or other tenants data. Therefore this commit adjusts the server query code so that if a system tenant id is provided, it returns data for only the system tenant. Fixes cockroachdb#108929 Release note (bug fix): adjust ts server queries to be able to return system tenant only metrics if tenant id is provided, this will fix an issue where some metrics graphs appear to double count.
109727: ts: update server queries to account for system tenant id r=Santamaura a=Santamaura Previously, ts queries would consider providing no tenant id and the system tenant id as the same and would return all the aggregated datapoints. This was likely due to the original implementation considering that the system tenant would always want to view all the aggregated data. This is not the case anymore since the system tenant has the ability to view all the data, system tenant specific data or other tenants data. Therefore this commit adjusts the server query code so that if a system tenant id is provided, it returns data for only the system tenant. Fixes #108929 Release note (bug fix): adjust ts server queries to be able to return system tenant only metrics if tenant id is provided, this will fix an issue where some metrics graphs appear to double count. Some screenshots after the change: All <img width="1422" alt="Screenshot 2023-08-30 at 10 59 50 AM" src="https://github.com/cockroachdb/cockroach/assets/17861665/2ddfb7b8-1980-4b88-9b92-ec2cba5e48f0"> System <img width="1422" alt="Screenshot 2023-08-30 at 11 00 25 AM" src="https://github.com/cockroachdb/cockroach/assets/17861665/5e7b18d7-4b9d-48dd-881c-4417bab104b1"> Tenant <img width="1422" alt="Screenshot 2023-08-30 at 11 00 13 AM" src="https://github.com/cockroachdb/cockroach/assets/17861665/b02a6683-5277-4fa7-a212-2999db935fd4"> 109793: storage: don't reread value during inline conditional writes r=itsbilal a=nvanbenschoten This commit removes the call to maybeGetValue when performing an inline conditional write. In such cases, we will have already read the value in the call to mvccGetMetadata, so we don't need to do so again. I'm not aware of any workloads that are sensitive to the performance of conditional inline writes and I also suspect that the positioning of the Pebble iterator was avoiding some work during the second seek, so this is mainly just intended to be a code simplification. Epic: None Release note: None Co-authored-by: Alex Santamaura <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]>
109042: server,cli: separate storage-level and application-level metrics r=yuzefovich a=knz Prerequisite to #102378. Informs #108929. Epic: CRDB-26691. Context: we have various projects that want to know which metrics belong to the application layer (and thus are instantiated and collected anew in every tenant); and which belong to the storage/KV layer (and thus exist only once per node in the storage layer). Prior to this patch, it was hard to distinguish between them. This patch enhances the situation as follows: - It introduces the concept of "server layering" for metrics, a concept which we had already introduced earlier in the `TestServerInterface`: - "storage" designates the storage layer, and only contains metrics relevant to the storage/kv layer. - "application" designates the application layer, and only contains metrics relevant to the application layer. - "server" is a special (pseudo) layer which contains metrics defined process-wide, and are thus reported both in combined sql/kv nodes and sql-only servers. - It also uses *separate metric registries* to collect the metrics at each layer during server initialization. This is the main component that we were missing for later projects. - It also enhances the `ChartCatalog` API endpoint and introduces a new `cockroach gen metric-list` command to use it and auto-generate a list of metrics with their layer information. Release note (cli change): A new `cockroach gen metric-list` is now available that can generate metadata that describes the various metrics collected by an (idle) server. Note that the list does not include dynamic metric names whose names are generated based on workload. 110624: sql: disable READ COMMITTED syntax by default r=chrisseto,nvanbenschoten a=rafiss fixes #107980 Release note (sql change): The cluster setting sql.txn.read_committed_syntax.enabled was added. It defaults to false. When set to true, the following statements will configure transactions to run under READ COMMITTED isolation, rather than being automatically interpeted as SERIALIZABLE. - BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED - SET TRANSACTION ISOLATION LEVEL READ COMMITTED - SET default_transaction_isolation = 'read committed' - SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ COMMITTED This setting is added since READ COMMITTED transactions are a preview feature, so usage of it is opt-in for v23.2. In a future CockroachDB major version, this setting will change to default to true. Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Rafi Shamim <[email protected]>
Previously, ts queries would consider providing no tenant id and the system tenant id as the same and would return all the aggregated datapoints. This was likely due to the original implementation considering that the system tenant would always want to view all the aggregated data. This is not the case anymore since the system tenant has the ability to view all the data, system tenant specific data or other tenants data. Therefore this commit adjusts the server query code so that if a system tenant id is provided, it returns data for only the system tenant. Fixes #108929 Release note (bug fix): adjust ts server queries to be able to return system tenant only metrics if tenant id is provided, this will fix an issue where some metrics graphs appear to double count.
In c2c, source cluster, I see write BW to disk over 1.3GBps, but the GCP console says 0.66GBps. Looks like IOPS have the same issue. If I just look at the system tenant I still see the double BW, but if I look at the app tenant I see the real 0.66GBps number.
Jira issue: CRDB-30702
The text was updated successfully, but these errors were encountered: