Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display bug in console metrics reporter for the number of failed operations #444

Closed
weideng1 opened this issue Aug 9, 2022 · 1 comment · Fixed by #445
Closed

Display bug in console metrics reporter for the number of failed operations #444

weideng1 opened this issue Aug 9, 2022 · 1 comment · Fixed by #445
Assignees
Milestone

Comments

@weideng1
Copy link
Collaborator

weideng1 commented Aug 9, 2022

After finishing a long-running (16 hours+) job, we got the following output:

Username and password provided but auth provider not specified, inferring PlainTextAuthProvider
A cloud secure connect bundle was provided: ignoring all explicit contact points.
A cloud secure connect bundle was provided and selected operation performs writes: changing default consistency level to LOCAL_QUORUM.
Operation directory: /mnt/data/logs/LOAD_20220808-171139-742199
      total | failed | rows/s | p50ms | p99ms | p999ms | batches
600,000,000 |      0 | 10,005 |  2.11 | 10.81 |  20.32 |    1.00
Operation LOAD_20220808-171139-742199 completed with 76 errors in 16 hours, 39 minutes and 29 seconds.
Rejected records can be found in the following file(s): load.bad
Errors are detailed in the following file(s): load-errors.log
A summary of the operation in CSV format can be found in summary.csv.

There is a discrepancy between the reported number of failed records in the table (0) and the 76 errors in the final summary.

From the following operations.log you can see the discrepancy as well.

2022-08-08 17:11:39 INFO  Username and password provided but auth provider not specified, inferring PlainTextAuthProvider
2022-08-08 17:11:39 INFO  A cloud secure connect bundle was provided: ignoring all explicit contact points.
2022-08-08 17:11:39 INFO  A cloud secure connect bundle was provided and selected operation performs writes: changing default consistency level to LOCAL_QUORUM.
2022-08-08 17:11:39 INFO  Operation directory: /mnt/data/logs/LOAD_20220808-171139-742199
2022-08-09 09:51:11 WARN  Operation LOAD_20220808-171139-742199 completed with 76 errors in 16 hours, 39 minutes and 29 seconds.
2022-08-09 09:51:11 INFO  Records: total: 600,000,000, successful: 600,000,000, failed: 0
2022-08-09 09:51:11 INFO  Batches: total: 600,000,000, size: 1.00 mean, 1 min, 1 max
2022-08-09 09:51:11 INFO  Memory usage: used: 3,424 MB, free: 1,469 MB, allocated: 4,894 MB, available: 7,828 MB, total gc count: 23,256, total gc time: 379,839 ms
2022-08-09 09:51:11 INFO  Writes: total: 600,000,000, successful: 599,999,924, failed: 76, in-flight: 0
2022-08-09 09:51:11 INFO  Throughput: 10,005 writes/second
2022-08-09 09:51:11 INFO  Latencies: mean 2.11, 75p 2.29, 99p 10.81, 999p 20.32 milliseconds
2022-08-09 09:51:13 INFO  Final stats:
2022-08-09 09:51:13 INFO  Records: total: 600,000,000, successful: 600,000,000, failed: 0
2022-08-09 09:51:13 INFO  Batches: total: 600,000,000, size: 1.00 mean, 1 min, 1 max
2022-08-09 09:51:13 INFO  Memory usage: used: 3,430 MB, free: 1,463 MB, allocated: 4,894 MB, available: 7,828 MB, total gc count: 23,256, total gc time: 379,839 ms
2022-08-09 09:51:13 INFO  Writes: total: 600,000,000, successful: 599,999,924, failed: 76, in-flight: 0
2022-08-09 09:51:13 INFO  Throughput: 10,005 writes/second
2022-08-09 09:51:13 INFO  Latencies: mean 2.11, 75p 2.29, 99p 10.81, 999p 20.32 milliseconds
2022-08-09 09:51:13 INFO  Rejected records can be found in the following file(s): load.bad
2022-08-09 09:51:13 INFO  Errors are detailed in the following file(s): load-errors.log
2022-08-09 09:51:13 INFO  A summary of the operation in CSV format can be found in summary.csv.

┆Issue is synchronized with this Jira Task by Unito

@adutra adutra self-assigned this Aug 10, 2022
@adutra adutra added this to the 1.10.0 milestone Aug 10, 2022
@adutra
Copy link
Contributor

adutra commented Aug 10, 2022

Indeed when loading, the number of failed records wasn't being incremented when the failure was due to a failed write statement execution. That specific counter is exposed as records/failed in the metrics registry, and is consumed by 2 reporters: the console reporter, and the records reporter. This counter is not used by the writes reporter though, which uses a different, but similar one. This explains why the writes reporter rightfully reported 76 errors, while the console and the record reporter both reported zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants