Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workload: log histogram write/encode failures, close output file #70484

Merged
merged 1 commit into from
Sep 21, 2021

Conversation

stevendanna
Copy link
Collaborator

We are currently observing incomplete histograms being output during
nightly roachperf tpccbench runs.

I don't think the changes here are likely to address the cause, as I
would expect write failures to affect a broader range of roachperf
output. But, it is still good to log any failures we do encounter.

Further, we now sync and close the file explicitly.

Informs #70313

Release note: None

We are currently observing incomplete histograms being output during
nightly roachperf tpccbench runs.

I don't think the changes here are likely to address the cause, as I
would expect write failures to affect a broader range of roachperf
output. But, it is still good to log any failures we do encounter.

Further, we now sync and close the file explicitly.

Informs cockroachdb#70313

Release note: None
@stevendanna stevendanna requested a review from a team September 21, 2021 10:36
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@stevendanna stevendanna requested review from a team and erikgrinaker and removed request for a team September 21, 2021 10:43
@tbg
Copy link
Member

tbg commented Sep 21, 2021

This seems to affect only tpccbench, right?
tpccbench is special in that it calls c.Reset which hard-resets the cluster VMs. I've observed somewhere that this also causes trailing null bytes in the logs: https://cockroachlabs.slack.com/archives/C01CDD4HRC5/p1630404151004600

I think the same same thing might be happening here? Perhaps the solution is as easy as making sure that we sync the histograms. Are we doing that?

@tbg
Copy link
Member

tbg commented Sep 21, 2021

I'm seeing that you're adding an explicit Sync() here - so maybe this does indeed fix the problem. Worth a try for sure.

@stevendanna
Copy link
Collaborator Author

TFTR!

I'm seeing that you're adding an explicit Sync() here - so maybe this does indeed fix the problem. Worth a try for sure.

Definitely possible. I think I had overlooked this since because we were able to Get() the file and then decode. But those reads could have all come from the cache even though they hadn't been flushed to disk yet.

@stevendanna
Copy link
Collaborator Author

bors r=erikgrinaker

@craig
Copy link
Contributor

craig bot commented Sep 21, 2021

This PR was included in a batch that was canceled, it will be automatically retried

@craig
Copy link
Contributor

craig bot commented Sep 21, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants