Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert #83357 and achieve the same impact on serde-json by adding a check in Vec::spec_extend #83797

Closed
wants to merge 3 commits into from

Conversation

saethlin
Copy link
Member

@saethlin saethlin commented Apr 3, 2021

No description provided.

@rust-highfive
Copy link
Collaborator

r? @m-ou-se

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 3, 2021
@saethlin
Copy link
Member Author

saethlin commented Apr 3, 2021

r? @dtolnay
@rylev @klensy per #83357 (comment)

@rust-highfive rust-highfive assigned dtolnay and unassigned m-ou-se Apr 3, 2021
@saethlin saethlin marked this pull request as draft April 3, 2021 00:24
@jyn514
Copy link
Member

jyn514 commented Apr 3, 2021

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 3, 2021
@bors
Copy link
Contributor

bors commented Apr 3, 2021

⌛ Trying commit 0dc772e with merge 7e666f9e8f9482542bd6b4dd7e39126881298982...

@bors
Copy link
Contributor

bors commented Apr 3, 2021

☀️ Try build successful - checks-actions
Build commit: 7e666f9e8f9482542bd6b4dd7e39126881298982 (7e666f9e8f9482542bd6b4dd7e39126881298982)

@rust-timer
Copy link
Collaborator

Queued 7e666f9e8f9482542bd6b4dd7e39126881298982 with parent 9b6c9b6, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (7e666f9e8f9482542bd6b4dd7e39126881298982): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 3, 2021
@saethlin
Copy link
Member Author

saethlin commented Apr 3, 2021

I do not know why this is slower. I'm going to stand up the rustc-perf benchmarking locally and try to figure this out.

@the8472
Copy link
Member

the8472 commented Apr 3, 2021

Note that there have been some changes to the spec_extend code path (see #83726) since #83357 landed so it may optimize differently now compared to when the last perf run was made. Maybe measure the revert and the new check separately.

@bjorn3
Copy link
Member

bjorn3 commented Apr 4, 2021

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 4, 2021
@bors
Copy link
Contributor

bors commented Apr 4, 2021

⌛ Trying commit 85bd865 with merge 13e6b40edb3de60b505f197a5a470cdd98f8fa74...

@bors
Copy link
Contributor

bors commented Apr 4, 2021

☀️ Try build successful - checks-actions
Build commit: 13e6b40edb3de60b505f197a5a470cdd98f8fa74 (13e6b40edb3de60b505f197a5a470cdd98f8fa74)

@rust-timer
Copy link
Collaborator

Queued 13e6b40edb3de60b505f197a5a470cdd98f8fa74 with parent 0850c37, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (13e6b40edb3de60b505f197a5a470cdd98f8fa74): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 4, 2021
@saethlin
Copy link
Member Author

saethlin commented Apr 4, 2021

I'm just more confused now. There's evidence posted in #83357 that the inlining tweaks were a regression, but now undoing them is also a regression??

@the8472
Copy link
Member

the8472 commented Apr 4, 2021

Oh, the link in #83357 (comment) points to the max-rss report, not instructions (the default).

max-rss tends to be noisy, but the reverts seem to be biased towards an improvement, i.e. it recovers the max-rss regression but also undoes the compile time improvements.

@saethlin
Copy link
Member Author

saethlin commented Apr 4, 2021

Wow I was really looking in the wrong direction.

I have no idea why max-rss would be impacted by these changes. My instinct is that blaming a max-rss regression on #83357 is a mistake; it doesn't change when and with what arguments any allocator is invoked. If it changes the size of anything, it should reduce code size.

@the8472
Copy link
Member

the8472 commented Apr 4, 2021

Wow I was really looking in the wrong direction.

I have no idea why max-rss would be impacted by these changes. My instinct is that blaming a max-rss regression on #83357 is a mistake; it doesn't change when and with what arguments any allocator is invoked. If it changes the size of anything, it should reduce code size.

The PR was not part of a rollup, so the perf runs are a direct comparison with its parent commit. And we have three comparisons now and all point in the same direction so that makes it less likely to be noise.

Vec code is a significant fraction of all code, vec iterator code doubly so, due to monomorization. So things are really sensitive to changes there.

So, reverting fixes the memory footprint issue. But it also undoes the instruction count improvements on some of the parts that involve little to no heavy lifting on the LLVM side (e.g. -check runs), so they're probably not due to more IR being generated but due to something else in the compiler returning to less efficient code.

So perhaps run one of the non-opt benchmarks under perf and see what code in the compiler is affected and whether we could improve the performance without the max-rss increase.

One possibility might be that you focused only on extend while the reserve changes had a broader impact.

@the8472
Copy link
Member

the8472 commented Apr 7, 2021

@saethlin do the serde speedups mostly come from Extend or FromIterator improvements? If it's about the latter then I think we can get reserve out of some FromIterator code paths.

@saethlin
Copy link
Member Author

saethlin commented Apr 7, 2021

@the8472 The serde improvement(s) come from making std::io::Write::write_all into a Vec<u8> cheaper. I believe that forwards down to the SpecExtend code path I was poking at in this PR, or at least that's what benchmarking indicates.

@the8472
Copy link
Member

the8472 commented Apr 7, 2021

fn write_all(&mut self, buf: &[u8]) -> io::Result<()> {
self.extend_from_slice(buf);
Ok(())
}

calls

pub fn extend_from_slice(&mut self, other: &[T]) {
self.spec_extend(other.iter())
}

which should specialize to

fn spec_extend(&mut self, iterator: slice::Iter<'a, T>) {
let slice = iterator.as_slice();
unsafe { self.append_elements(slice) };
}

calls

#[inline]
unsafe fn append_elements(&mut self, other: *const [T]) {
let count = unsafe { (*other).len() };
self.reserve(count);
let len = self.len();
unsafe { ptr::copy_nonoverlapping(other as *const T, self.as_mut_ptr().add(len), count) };
self.len += count;
}

So there's a reserve, but it's not the one in spec_extend that's relevant if you want to optimize serde.
And we should collapse some of that call-chain.

I think that warrents a few separate PRs. One for serde improvements. One purely for the revert, assuming the libs team prefers to keep max-rss low at the expense of the instructions:u losses (that pretty much seems to be the trade-off here).
And then I have an idea how to avoid one or two reserve calls which might claw back some of the instruction count losses.

@nnethercote
Copy link
Contributor

FWIW, I made a bunch of improvements to these code paths last year to reduce code size. At first the changes gave obvious improvements, but by the end things were getting unpredictable and it was hard to improve some things without regressing other things.

@saethlin
Copy link
Member Author

I was never sure what the next steps were on here, and the codebase seems to have moved on a bit. I'm still here, but this PR isn't really.

@saethlin saethlin closed this May 22, 2021
@saethlin saethlin deleted the check-before-vec-reserve branch May 16, 2022 04:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.