Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requests with duplicate query params fail to be logged in analytics database in some cases #358

Closed
GUI opened this issue Nov 3, 2016 · 1 comment

Comments

@GUI
Copy link
Member

GUI commented Nov 3, 2016

While pulling some analytics yesterday, I stumbled upon some error logs that indicated some requests were not being logged into our ElasticSearch database. After digging into it a bit further, I confirmed that we were indeed failing to log certain requests. Specifically, if a request had duplicate query parameters named the same things (?foo=bar&foo=baz), then in certain cases (but not all cases), the request was never logged to our analytics database.

Overall, the missing requests appear to represent a small percentage of our overall traffic (around 0.1%), but it unfortunately impacted a couple specific APIs more than others (mainly when the API used duplicate query params to represent arrays, but still got decent amount of traffic only passing in a single value for the array type). So for these couple of impacted APIs, the impact on the metrics might be more percentage-wise.

The technical details of what happened are pretty similar to this issue from last year, but a slightly different variation on it: #213

As background, we store an object containing all the query parameters in our ElasticSearch database. This data is based upon deserializing the query parameter into an object, so ?foo=bar&hello=goodbye becomes something like { "foo": "bar", "hello": "goodbye" }. We store that object in its deserialized form within each ElasticSearch document (so all the query params become nested JSON key value pairs within the document).

The issue stems from the fact that duplicate URL parameters get parsed as an array, so ?foo=bar&foo=baz becomes { "foo": ["bar", "baz"] }. That alone is fine, but things begin to fail if a similar request comes in where that same parameter name is not treated as an array (eg, foo is used again as ?foo=bar becoming { "foo": "bar" }). In this case, ElasticSearch's lazy typing bites us, since the first item into the index declares the type of that field, and all future uses of that field must match the same type. So the problem is that when there's mixed typing for the same field, ElasticSearch will start rejecting whatever requests don't match the type first seen for the field. So if the field was first seen as an array, then subsequent calls treating it as an array will succeed, but a single string value will fail (and vice versa, depending on whatever type was seen first).

To fix this, I took a first pass as always flattening the query parameters so the values were always normalized to be strings (see NREL/api-umbrella@524d2da). However, I realized that rolling that fix out would be a bit tricky, since it would still break logging for any requests seen in this month's index where an array was seen first (since the normalized string wouldn't match the expected array type). We could fix this by reindexing our existing data to normalize all the types to strings, but reindexing live elasticsearch data isn't super fun.

So instead, to fix this, I've simply disabled the logging of this query parameter object altogether (see NREL/api-umbrella@a432b22). The storage of all the query parameters as an object is actually superfluous, since we've never used it for anything. We still log all the query parameters as part of the original request URL string, so we still have the data. This change to disable it simply means we're not storing all of the query parameters again broken apart as an object. Storing it as an object theoretically has some uses for more efficiently querying the data based on query values, but since we've never gotten around to using that (and there doesn't seem to be a huge driver for that), it seemed easier to disable this completely (but we can always repopulate it based on the raw data if we actually develop a use-case for it).

@GUI
Copy link
Member Author

GUI commented Nov 3, 2016

Fix deployed to production.

@GUI GUI closed this as completed Nov 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant