Skip to content
This repository has been archived by the owner on Oct 30, 2021. It is now read-only.

Expand 'mask' bitmask space from unsigned short to uint32_t (16=>32 bits) #82

Merged
merged 4 commits into from
Jan 30, 2017

Conversation

yhahn
Copy link
Member

@yhahn yhahn commented Jan 30, 2017

Problem

  • carmen currently supports a maximum of 20 query tokens (https://github.com/mapbox/carmen/blob/master/lib/constants.js#L7)
  • carmen-cache represents the bit mask for subqueries using unsigned short which has 16 bits -- so it can at most represent a 16-token query
  • What currently happens in some cases is not just bad/empty geocoding, but possibly horrendous resource usage because coalesceMulti relies on this mask check to prevent context cover lists from ballooning out of size (
    } else if (!(context_mask & parent.mask)) {
    )

This PR

cc @mapbox/geocoding-gang @springmeyer

assert.end();
});
});

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The testcase below works by

  • Adding 10k features with the same x/y coordinates to a gridcache entry
  • Layering two features on top of causing each to inherit the 10k features into their respective cover lists

Normally as each of those features incorporate parent features into their contexts, they would do a mask check making sure that they don't add two features that both correspond to the same term in a query. When a mask overflow occurs in master it results in an empty mask which can always be combined with any other feature. This leads to the child features inheriting all 10k parent features and leads to OOM.

@yhahn yhahn merged commit 3694038 into master Jan 30, 2017
@yhahn yhahn deleted the oww branch January 30, 2017 01:59
@apendleton
Copy link
Contributor

Whoa, nice find!

@springmeyer
Copy link
Contributor

springmeyer commented Jan 30, 2017

Nice work tracking this one down @yhahn.

Your finding has prompted me to think hard about how we can avoid overflows in the future. I'll drop a few notes here now.

Removing the static_cast in the code before your fix like:

-            subq.mask = static_cast<unsigned short>(jsStack->Get(Nan::New("mask").ToLocalChecked())->IntegerValue());
+            subq.mask = jsStack->Get(Nan::New("mask").ToLocalChecked())->IntegerValue();

Reveals a warning which helps show the potential for a problem:

../src/binding.cpp:1478:74: warning: implicit conversion loses integer precision: 'int64_t' (aka 'long long') to 'unsigned short' [-Wconversion]
            subq.mask = jsStack->Get(Nan::New("mask").ToLocalChecked())->IntegerValue();

We have several of these warnings currently happening in the code in other places - they look minor but I think we should resolve them so that if new warnings come up they are obvious. Per chat I also am keen to revisit our use of static_cast. I'm thinking we should migrate to numeric_cast in most places which would prevent both the warnings (when we don't care) and throw on actual overflows that are unexpected.

I've found -fsanitize=integer very good at finding overflows at runtime. But trying to run the new tests on OS X with -fsanitize=integer leads to the program getting killed rather than ever hitting an overflow error. Will need to dig more tomorrow about why this is...

# coalesceMulti mask overflow
# start coalesce (mask: 18)


./setup.sh: line 9: 19250 Killed: 9               DYLD_INSERT_LIBRARIES=${MASON_LLVM_RT_PRELOAD} node ./node_modules/.bin/tape test/coalesce.test.js

@springmeyer
Copy link
Contributor

I'm thinking we should migrate to numeric_cast in most places which would prevent both the warnings (when we don't care) and throw on actual overflows that are unexpected.

I've sketched out a next action at #94 to ensure we are more protected from bugs like this in the future.

I've found -fsanitize=integer very good at finding overflows at runtime. But trying to run the new tests on OS X with -fsanitize=integer leads to the program getting killed rather than ever hitting an overflow error. Will need to dig more tomorrow about why this is...

it is because this is not a supported feature to detect these types of problems yet in the sanitizers. So we are dependent currently on -Wconversion warning or adding explicit checks via numeric_cast. More details on this at #94

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants