Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wrong name tag for HTTP metrics from redirects #1570

Merged
merged 2 commits into from
Sep 4, 2020
Merged

Conversation

imiric
Copy link
Contributor

@imiric imiric commented Jul 23, 2020

Closes #1474

The root cause was reusing the same 'name' tag for all requests processed by a transport. In the case of redirects and digest auth the transport would process more than one request and needs to update the 'name' tag in the same way as 'url' is. For the initial request we need the originally parsed one to be consistent with the credential masking feature (as this gets stripped in the request passed to RoundTrip()), and tags for subsequent requests will be dynamically generated.

I'm not sure if this confirms @mstoykov's gut feeling that it will break aggregation somehow, but I'd argue this is the correct approach so we'll have to deal with it. 😄

I didn't add more tests for this as the current ones are quite thorough.

@imiric imiric requested review from mstoykov and na-- July 23, 2020 08:56
@codecov-commenter
Copy link

codecov-commenter commented Jul 23, 2020

Codecov Report

Merging #1570 into master will decrease coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1570      +/-   ##
==========================================
- Coverage   73.91%   73.90%   -0.02%     
==========================================
  Files         165      165              
  Lines       14519    14521       +2     
==========================================
  Hits        10732    10732              
- Misses       3224     3225       +1     
- Partials      563      564       +1     
Impacted Files Coverage Δ
lib/netext/httpext/request.go 96.04% <100.00%> (-1.24%) ⬇️
lib/netext/httpext/transport.go 96.62% <100.00%> (+0.37%) ⬆️
lib/testutils/minirunner/minirunner.go 78.57% <0.00%> (-3.58%) ⬇️
lib/executors.go 94.11% <0.00%> (+1.96%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0737194...bfd7a29. Read the comment docs.

@na--
Copy link
Member

na-- commented Jul 23, 2020

After my initial dumb comment, I'll try to make one that's a bit more informed, albeit in the same direction... 😅

For the initial request we need the originally parsed one to be consistent with the credential masking feature (as this gets stripped in the request passed to RoundTrip()), and tags for subsequent requests will be dynamically generated.

Looking at this code: https://github.com/loadimpact/k6/blob/5ff4a76a4a08241ce061fcc449be585f7f707e6a/lib/netext/httpext/digest_transport.go#L43-L57

I'm thinking we shouldn't bother with numProcReqs and the special handling at all, since this logic is trying to mask something we don't actually do. The current HTTP digest logic actually makes an initial request without authentication, so we should emit it like it happens, not try to mask it.

We should definitely fix the current dumb digest implementation (#800), but we shouldn't try to pretend it doesn't do the stupid things it does...

@imiric
Copy link
Contributor Author

imiric commented Jul 23, 2020

@na-- Ah, even better! I thought this request came from stdlib and not from our code. Thanks, will fix.

@na--
Copy link
Member

na-- commented Jul 23, 2020

Unfortunately, stdlib still doesn't have digest authentication, so we've had to make do with external libraries. The current one is the second attempt IIRC, and it also sucks. As explained in that issue, we need some substantial refactoring, including a per-VU credential cache or something like that, to fix the issue we currently have of making 2 HTTP requests every time... 😞

imiric pushed a commit that referenced this pull request Jul 23, 2020
@imiric
Copy link
Contributor Author

imiric commented Jul 23, 2020

@na-- Unfortunately, I don't think we can get rid of some wonkiness here. :-/

Besides the digest credential masking, the original URL name is also needed to keep the metrics aggregatable on the same query params. So @mstoykov was right, and this will make subsequent requests not part of this group, but there's a point to be made that they should be aggregated differently since the original query string isn't passed to these later requests anyway. Though the name tag seems to be abused for several purposes, so we might want to rethink that at some point.

I got rid of the counter in 4f30941, but some state change is still needed to set this tag for only the first request. Wiping the string after the first pass seems slightly better, but still not great. Let me know.

Comment on lines 127 to 133
// If a name tag is already set in the script, use it for all requests
// processed by this transport. Otherwise, use the original URL name
// to preserve any escaped query string arguments, but only for the
// first request in the chain.
if t.origURLName != "" {
tags["name"] = t.origURLName
t.origURLName = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still seems strange... if a name tag was set by the user in the script, wouldn't it be in the t.tags map above? 😕

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, nevermind, "spoke" too soon again - this comment is about the if _, ok := tags["name"]; !ok part of the check...

still, I continue to not understand why we need the original URL, when cleanURL should suffice 😕

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because otherwise this test fails.

I'm trying to see if I can clean it again in the transport the same way while preserving the escaped query args, but I'm not functioning on all cylinders today. :-/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because otherwise this test fails.

Ahhh... welll... 💩 😅 Yeah, I'll have to scratch my head about this some more, but the current version of this PR might be as good as it gets... 😞 Not sure why anyone added this (on the bottom of that section) funky way of specifying URLs, but now we have to live with it... 😭 It implies a name tag, but it doesn't specify it, instead using this weird k6-specific URL struct, which of course we can't propagate in an http.RoundTrippter 😭 And thus, I finally understand the purpose of all of the weird fixes in this PR 😅

Let's discuss this on Monday again, but I'm thinking that if someone uses something like this:

http.get(http.url`http://example.com/posts/${id}`)

we probably should consider it as if they used

http.get(`http://example.com/posts/${id}`, {tags: {name: 'http://example.com/posts/${}'}})

That is, if there are any redirects in the course of that HTTP request, all of their name tags should be 'http://example.com/posts/${}'. Or, said otherwise, #1474 is not a bug when the user has explicitly used the http.url tagged template.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think that should be the case in 62f9184. The only wonky thing is the obscure preq.URL.Name != preq.URL.CleanURL check, which works when a tagged template string is used.

This does break the masked credentials behavior for digest auth, i.e. the generated name tag in these cases won't be https://****:****@<host>... anymore, but https://<host>. I discussed this with @na-- the other day, so this might be acceptable, and if so we should mention it in the release notes.

imiric pushed a commit that referenced this pull request Jul 24, 2020
Copy link
Contributor

@mstoykov mstoykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take another look (mostly at the tests ;) in an hour

But it seems like it's fine and is smaller change then it was before 🎉

Also relevant (but not useful for us) addition in golang 1.15 https://tip.golang.org/pkg/net/url/#URL.Redacted

@@ -181,7 +181,7 @@ func TestRequestAndBatch(t *testing.T) {
}

assert.Equal(t, 10, reqsCount)
assertRequestMetricsEmitted(t, bufSamples, "GET", sr("HTTPBIN_URL/get"), sr("HTTPBIN_URL/redirect/9"), 200, "")
assertRequestMetricsEmitted(t, bufSamples, "GET", sr("HTTPBIN_URL/get"), sr("HTTPBIN_URL/get"), 200, "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check that we got 302 for all the others in this case? Or at least one-two

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also (not really related to this PR) ... assertRequestMetricsEmitted having so many fields is not great ( I needed to go look up what all the fields are).

Also not great is that there is logic for empty name being equal to url, which you used below, but not here ...

I do think that that logic should be dropped and we should always provide the name ... maybe at least for this PR

cc @na--

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the question, sorry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think

 			assertRequestMetricsEmitted(t, bufSamples, "GET", url, "", 200, "") 

should mean

 			assertRequestMetricsEmitted(t, bufSamples, "GET", url, url, 200, "") 

I also do think that it will be nice if ... I dunno this is

 			assertRequestMetricsEmitted(t, bufSamples, expectmetric{
                             method: "GET", 
                             url: url, 
                             name:"",
                             status: 200,
                             whateverThisIs: "",
                        }) 

For this PR maybe just write the name for the other tests, instead of leaving it ""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh, ok, yeah, url, "" is confusing, 👍 👍 for url, url,... A full refactoring of the test can wait for a future PR though 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the redirect checks in 70144e8, and while I agree that the name = url override is unnecessary, I'd rather leave that change for a follow-up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to check all redirects in e6b65c8.

Comment on lines 118 to 131
cleanURL := URL{u: unfReq.request.URL, URL: unfReq.request.URL.String()}.Clean()
if enabledTags.Has(stats.TagURL) {
u := URL{u: unfReq.request.URL, URL: unfReq.request.URL.String()}
tags["url"] = u.Clean()
tags["url"] = cleanURL
}

if _, ok := tags["name"]; !ok && enabledTags.Has(stats.TagName) {
tags["name"] = cleanURL
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would've really prefered it, if cleanURL is created only if it's needed :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this in 8872f4a, at the expense of duplication, though I'm not sure it will be more efficient overall, as most of the time both url and name tags will be provided. We can add additional checks, but I don't want to complicate this too much.

imiric pushed a commit that referenced this pull request Jul 27, 2020
imiric pushed a commit that referenced this pull request Jul 27, 2020
js/modules/k6/http/request_test.go Outdated Show resolved Hide resolved
js/modules/k6/http/request_test.go Outdated Show resolved Hide resolved
js/modules/k6/http/request_test.go Outdated Show resolved Hide resolved
js/modules/k6/http/request_test.go Outdated Show resolved Hide resolved
imiric pushed a commit that referenced this pull request Sep 4, 2020
imiric pushed a commit that referenced this pull request Sep 4, 2020
imiric pushed a commit that referenced this pull request Sep 4, 2020
imiric pushed a commit that referenced this pull request Sep 4, 2020
@imiric imiric requested review from mstoykov and na-- September 4, 2020 09:23
mstoykov
mstoykov previously approved these changes Sep 4, 2020
Copy link
Contributor

@mstoykov mstoykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I haven't tested it ... but the test looks fine.
I vote tentatively for leaving this for v0.29.0 (and merging it early on this time ;) ) just because of the nature of the change and (IMO) low priority of the issue given this has been the case forever and nobody has complained

Comment on lines 119 to 131
cleanURL := URL{u: unfReq.request.URL, URL: unfReq.request.URL.String()}.Clean()
tags["url"] = cleanURL
}

if _, ok := tags["name"]; !ok && enabledTags.Has(stats.TagName) {
cleanURL := URL{u: unfReq.request.URL, URL: unfReq.request.URL.String()}.Clean()
tags["name"] = cleanURL
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest given the high amount of code that will execute those, I would prefer if we do actually do this in (pseudo code):

var urlEnabled = enabledTags.Has(url)...;
var nameToBeSet = <the whole if>;
if urlEnabled || nameToBeSet {
  cleanUrl = ...
  if urlEnabled { 
    tags["url"] cleanURL
  } 
  if name....

both are likely to be true always and I would argue it isn't the worst code out there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, makes sense, see 8ee1d1c.

@na--
Copy link
Member

na-- commented Sep 4, 2020

We've been delaying this for quite a while, so I vote for v0.28.0. If there's a bug in this, it's unlikely to be worse than the bug that currently exists... The fact that nobody has commented on it probably means that most people use url and not name, or they just haven't noticed the issue in the cloud tests...

na--
na-- previously approved these changes Sep 4, 2020
Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if urlEnabled {
tags["url"] = cleanURL
}
if _, ok := tags["name"]; !ok && nameEnabled {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure I am becoming a PITA, but can you move the _, ok := tags["name"]; !ok in the above if? or better yet in the nameEnabled ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure I am becoming a PITA

Only a bit :)

Is 413d0fe what you had in mind? I think I preferred the previous version, but out of curiosity, why is this better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could have had name set and url disabled and you would've stilled cleaned the url and then found out you didn't need it

imiric pushed a commit that referenced this pull request Sep 4, 2020
na--
na-- previously approved these changes Sep 4, 2020
Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we (squash+)merge this now? 😅

mstoykov
mstoykov previously approved these changes Sep 4, 2020
Closes #1474

The root cause was reusing the same 'name' tag for all requests
processed by a transport. In the case of redirects and digest auth the
transport would process more than one request and needs to update the
'name' tag in the same way as 'url' is, but only for the initial
request, as subsequent ones will by dynamically generated.
@imiric imiric dismissed stale reviews from mstoykov and na-- via bfd7a29 September 4, 2020 12:59
@imiric
Copy link
Contributor Author

imiric commented Sep 4, 2020

Sorry guys, please approve one last time. I wanted to squash the fix separately from the .gitignore change, so had to do it manually.

Copy link
Contributor

@mstoykov mstoykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem @imiric ;)

@imiric imiric merged commit feb5056 into master Sep 4, 2020
@imiric imiric deleted the fix/1474-name-tag branch September 4, 2020 13:07
@na-- na-- added this to the v0.28.0 milestone Sep 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong name tag for HTTP metrics from redirects
4 participants