Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: operator[] checks bounds in debug mode #16002

Conversation

GitHubTracey
Copy link
Contributor

operator[] restrict check bounds to debug mode

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines
Affected core subsystem(s)

src

@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. labels Oct 6, 2017
@@ -44,7 +44,9 @@ class Vector {

// Access individual vector elements - checks bounds in debug mode.
T& operator[](size_t index) const {
#ifdef DEBUG
CHECK(index < length_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep the check - if it's worth doing in debug mode, it's almost always worth doing in release mode too.

I probably should have updated the comment in 1b7372f. Can you do that now?

@Trott Trott added the code-and-learn Issues related to the Code-and-Learn events and PRs submitted during the events. label Oct 7, 2017
@addaleax
Copy link
Member

addaleax commented Oct 7, 2017

Benchmarks:

$ ./node benchmark/compare.js --new ./node --old ./node-1b358f1fde0e --runs 10 --filter buffer-indexof.js --set iter=0.1 --set encoding=utf8 buffers | Rscript benchmark/compare.R
[00:03:58|% 100| 1/1 files | 20/20 runs | 30/30 configs]: Done
                                                                                                                                              improvement confidence      p.value
 buffers/buffer-indexof.js iter=0.1 type="buffer" encoding="utf8" search="@"                                                                     -3.05 %            4.074023e-01
 buffers/buffer-indexof.js iter=0.1 type="buffer" encoding="utf8" search="10x"                                                                    1.36 %            7.158332e-01
 buffers/buffer-indexof.js iter=0.1 type="buffer" encoding="utf8" search="aaaaaaaaaaaaaaaaa"                                                     13.98 %        *** 6.245506e-14
 buffers/buffer-indexof.js iter=0.1 type="buffer" encoding="utf8" search="Alice"                                                                 -1.62 %            6.286298e-01
 buffers/buffer-indexof.js iter=0.1 type="buffer" encoding="utf8" search="among mad people"                                                      15.27 %        *** 4.234457e-15
 buffers/buffer-indexof.js iter=0.1 type="buffer" encoding="utf8" search="found it very"                                                         13.04 %        *** 7.519944e-15
 buffers/buffer-indexof.js iter=0.1 type="buffer" encoding="utf8" search="Gryphon"                                                                0.55 %            4.549318e-01
 buffers/buffer-indexof.js iter=0.1 type="buffer" encoding="utf8" search="</i> to the Caterpillar"                                               12.76 %        *** 5.213227e-11
 buffers/buffer-indexof.js iter=0.1 type="buffer" encoding="utf8" search="--l"                                                                    6.29 %        *** 9.629246e-08
 buffers/buffer-indexof.js iter=0.1 type="buffer" encoding="utf8" search="neighbouring pool"                                                     12.84 %        *** 1.342501e-13
 buffers/buffer-indexof.js iter=0.1 type="buffer" encoding="utf8" search="Ou est ma chatte?"                                                     10.85 %        *** 1.118890e-12
 buffers/buffer-indexof.js iter=0.1 type="buffer" encoding="utf8" search="Panther"                                                                3.29 %         ** 1.133720e-03
 buffers/buffer-indexof.js iter=0.1 type="buffer" encoding="utf8" search="Soo--oop"                                                              13.48 %        *** 7.043781e-17
 buffers/buffer-indexof.js iter=0.1 type="buffer" encoding="utf8" search="SQ"                                                                     1.16 %            3.090357e-01
 buffers/buffer-indexof.js iter=0.1 type="buffer" encoding="utf8" search="venture to go near the house till she had brought herself down to"     12.52 %        *** 6.585226e-13
 buffers/buffer-indexof.js iter=0.1 type="string" encoding="utf8" search="@"                                                                      0.36 %            9.254180e-01
 buffers/buffer-indexof.js iter=0.1 type="string" encoding="utf8" search="10x"                                                                    0.90 %            7.551025e-01
 buffers/buffer-indexof.js iter=0.1 type="string" encoding="utf8" search="aaaaaaaaaaaaaaaaa"                                                     13.83 %        *** 4.085834e-14
 buffers/buffer-indexof.js iter=0.1 type="string" encoding="utf8" search="Alice"                                                                  0.71 %            8.635437e-01
 buffers/buffer-indexof.js iter=0.1 type="string" encoding="utf8" search="among mad people"                                                      15.92 %        *** 1.473361e-15
 buffers/buffer-indexof.js iter=0.1 type="string" encoding="utf8" search="found it very"                                                         13.29 %        *** 2.628669e-14
 buffers/buffer-indexof.js iter=0.1 type="string" encoding="utf8" search="Gryphon"                                                                1.35 %            8.332586e-02
 buffers/buffer-indexof.js iter=0.1 type="string" encoding="utf8" search="</i> to the Caterpillar"                                               13.32 %        *** 1.593228e-14
 buffers/buffer-indexof.js iter=0.1 type="string" encoding="utf8" search="--l"                                                                    5.36 %        *** 3.708626e-06
 buffers/buffer-indexof.js iter=0.1 type="string" encoding="utf8" search="neighbouring pool"                                                     12.99 %        *** 4.543519e-15
 buffers/buffer-indexof.js iter=0.1 type="string" encoding="utf8" search="Ou est ma chatte?"                                                      9.79 %        *** 5.845900e-12
 buffers/buffer-indexof.js iter=0.1 type="string" encoding="utf8" search="Panther"                                                                2.93 %         ** 4.257264e-03
 buffers/buffer-indexof.js iter=0.1 type="string" encoding="utf8" search="Soo--oop"                                                              13.75 %        *** 4.978218e-15
 buffers/buffer-indexof.js iter=0.1 type="string" encoding="utf8" search="SQ"                                                                     2.38 %          * 3.816994e-02
 buffers/buffer-indexof.js iter=0.1 type="string" encoding="utf8" search="venture to go near the house till she had brought herself down to"     11.67 %        *** 9.017317e-12

@addaleax
Copy link
Member

CI: https://ci.nodejs.org/job/node-test-commit/13134/

This should be ready.

@jasnell
Copy link
Member

jasnell commented Oct 15, 2017

@bnoordhuis ... is your objection here a hard objection to this landing? #16002 (comment)

@gireeshpunathil
Copy link
Member

the CHECK makes sense to me even in release mode irrespective of what performance numbers are. This is the minimum check required to make sure buggy access do not corrupt other parts of memory ( a wrong large index can cause negative numbers to length - index which can then become large offsets into the array I guess.

@bnoordhuis
Copy link
Member

@jasnell Not a hard objection but like @gireeshpunathil says, removing it from release builds is arguably a reduction in robustness.

I had a working hypothesis that the CHECK inhibits inlining. I haven't had time to investigate that fully but the patch below brings performance about halfway between what we have now and with the check removed:

diff --git a/src/string_search.cc b/src/string_search.cc
index 326fba7c4a..f939bdc0e9 100644
--- a/src/string_search.cc
+++ b/src/string_search.cc
@@ -7,5 +7,9 @@ int StringSearchBase::kBadCharShiftTable[kUC16AlphabetSize];
 int StringSearchBase::kGoodSuffixShiftTable[kBMMaxShift + 1];
 int StringSearchBase::kSuffixTable[kBMMaxShift + 1];
 
+void CheckIndex(size_t index, size_t length) {
+  CHECK(index < length);
+}
+
 }  // namespace stringsearch
 }  // namespace node
diff --git a/src/string_search.h b/src/string_search.h
index 73e90f5873..9ff8e34f75 100644
--- a/src/string_search.h
+++ b/src/string_search.h
@@ -13,6 +13,7 @@
 namespace node {
 namespace stringsearch {
 
+void CheckIndex(size_t index, size_t length);
 
 // Returns the maximum of the two parameters.
 template <typename T>
@@ -42,9 +43,9 @@ class Vector {
   // In the latter case, v[0] corresponds to the *end* of the memory range.
   size_t forward() const { return is_forward_; }
 
-  // Access individual vector elements - checks bounds in debug mode.
+  // Access individual vector elements.
   T& operator[](size_t index) const {
-    CHECK(index < length_);
+    CheckIndex(index, length_);  // Out-of-line to encourage method inlining.
     return start_[is_forward_ ? index : (length_ - index - 1)];
   }
 

@gireeshpunathil
Copy link
Member

@bnoordhuis - what if we further customize this check (fully inline and reduce to its simplest form) - such as:

assert(index < length_);

?
Is it non-portable / less consumable? Given the circumstances (perf) can we consider such a usage as a one-time exception to the standard coding practice?

@bnoordhuis
Copy link
Member

What CHECK() has going over assert() is that it prints a stack trace. It's tremendously helpful for bug reports.

I managed to get performance within 2% of this PR with the patch below:

diff --git a/src/string_search.h b/src/string_search.h
index 73e90f5873..65be49f1a4 100644
--- a/src/string_search.h
+++ b/src/string_search.h
@@ -42,9 +42,9 @@ class Vector {
   // In the latter case, v[0] corresponds to the *end* of the memory range.
   size_t forward() const { return is_forward_; }
 
-  // Access individual vector elements - checks bounds in debug mode.
+  // Access individual vector elements.
   T& operator[](size_t index) const {
-    CHECK(index < length_);
+    if (index >= length_) node::Abort();
     return start_[is_forward_ ? index : (length_ - index - 1)];
   }
 

@gireeshpunathil
Copy link
Member

@bnoordhuis - thanks, Preceded with a one liner on the lines of:
// use a low level assertion, to tradeoff between performance and robustness
I am +1 on this change. Ping @BridgeAR @TimothyGu @jasnell @addaleax PTAL

@TimothyGu
Copy link
Member

I'm fine with @bnoordhuis's suggestion.

@BridgeAR
Copy link
Member

@GitHubTracey would you be so kind and update the PR according to @bnoordhuis suggestion?

@GitHubTracey
Copy link
Contributor Author

GitHubTracey commented Oct 18, 2017 via email

@addaleax
Copy link
Member

I mean, I would still prefer the patch in its current form. Performing the bounds check for every single access seems excessive – also, in other places we do this the same way, e.g. in #15077.

I’d really prefer to work towards making Node’s test pass in debug mode (and I have a few V8 CLs and Node PRs merged/open), and then having a CI machine that exercises all the debug mode checks…

@gireeshpunathil
Copy link
Member

@addaleax - so what is your view on protecting the object from memory corruption?

@GitHubTracey
Copy link
Contributor Author

Hey All,
Signed in on my laptop. I'll try to remember to bring it with me tomorrow - if there is a consensus, I will go ahead with whatever changes are required. As Anna was the one to point me towards this item, I'd like to make sure to have her buy-in before I submit.
Thanks to everyone! I'm learning so much with each comment!

@addaleax
Copy link
Member

@gireeshpunathil Phrasing the question like that certainly is a bit suggestive :) Basically, my stance is:

  • I think per-element-access index checks are excessive. There are existing OOB CHECKs outside of these tight loops that protect against OOB accesses.
  • This definitely qualifies as hot code, and is used frequently throughout Node and userland, so performance is relevant – basically the entire reason the file exists is perf. I think even 2 % improvements are something we should take.
  • This file has pretty good test coverage, including all lines that would lead to these checks, and the test suite for this is extensive; therefore, it is reasonable to assume that an issue would at least be caught in debug mode, if these checks are turned on in debug mode. (And this is assuming the mentioned none of the other checks point to an issue!)
  • From a security standpoint, even if there was an issue in some rare edge case, it would be very hard to exploit since these accesses are read-only and sequential – the worst that would be realistically feasible would be causing a crash by running past the buffer end, but we get a crash with CHECKs as well.

So, this is why I would feel comfortable with landing this as it is, and why I would prefer that over the other options.

@gireeshpunathil
Copy link
Member

Anna, I did not mean any provocations, apologies if the words sounded so. Your points on the vitality and adequate test coverage of the code section sounds good to me, and I am sufficiently convinced on its removal being safe, while improving performance.

@addaleax
Copy link
Member

@gireeshpunathil I didn’t read anything as provocative here :)

@gireeshpunathil
Copy link
Member

So can we have consensus on this PR please! As I flipped views, I take the onus to take this forward to graceful completion.
Please raise concern if you strongly feel against the current change (removal of CHECK in release builds). (I was the only one I guess).

@addaleax
Copy link
Member

Yes, I think it’s okay to land this in the next few days if there are no explicit objections

@addaleax
Copy link
Member

Landed in 838eca2 🎉

@GitHubTracey Thanks for your second commit here!

@addaleax addaleax closed this Oct 22, 2017
addaleax pushed a commit that referenced this pull request Oct 22, 2017
PR-URL: #16002
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Timothy Gu <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
@GitHubTracey
Copy link
Contributor Author

GitHubTracey commented Oct 22, 2017 via email

MylesBorins pushed a commit that referenced this pull request Oct 23, 2017
PR-URL: #16002
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Timothy Gu <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
@vsemozhetbyt
Copy link
Contributor

@addaleax It seems there is an issue with this PR re #16340 + #16360

@addaleax
Copy link
Member

@vsemozhetbyt Hm – not sure, I don’t think anything can be done about that at this point?

@vsemozhetbyt
Copy link
Contributor

vsemozhetbyt commented Oct 24, 2017

@addaleax I am not sure, but see #16360 (comment)

@GitHubTracey Could you add this local email to your GitHub account? Let us see if something will be updated after this.

1

@addaleax
Copy link
Member

@vsemozhetbyt Yeah, it’s my bad I didn’t ask for the name, but why would we want to require authors to add email addresses to github?

@vsemozhetbyt
Copy link
Contributor

If the addresses in local git setting and GitHub setting is not in sync, the commit is not associated with GitHub user and the user is not promoted to Conributor after landing.

@GitHubTracey
Copy link
Contributor Author

GitHubTracey commented Oct 24, 2017 via email

@addaleax
Copy link
Member

Yeah, right – if it’s about that, I think we should make clear that this only affects how Github displays things, there’s no technical need to provide any data to Github.

@vsemozhetbyt
Copy link
Contributor

vsemozhetbyt commented Oct 25, 2017

Commit association is properly updated, but not the Contributor status, unfortunately. Let's hope this will be fixed with next contributions from @GitHubTracey!

addaleax pushed a commit to ayojs/ayo that referenced this pull request Oct 26, 2017
PR-URL: nodejs/node#16002
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Timothy Gu <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
addaleax pushed a commit to ayojs/ayo that referenced this pull request Dec 7, 2017
PR-URL: nodejs/node#16002
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Timothy Gu <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. code-and-learn Issues related to the Code-and-Learn events and PRs submitted during the events.
Projects
None yet
Development

Successfully merging this pull request may close these issues.