highlighting: add support for simulating highlighting timeouts #4364

slimsag · 2019-06-05T08:14:47Z

This is useful for confirming our behavior of highlighting timeouts, and can be
used in the future to automate testing of this behavior as part of an e2e test.

Primarily adding to reproduce https://github.com/sourcegraph/sourcegraph/issues/4267

Note: This matches other cases where e.g. github.com/sourcegraphtest/AlwaysCloningTest
is a special repository for testing gitserver cloning states end-to-end.

To use this you just need to add github.com/sourcegraph/AlwaysHighlightTimeoutTest to your instance and view a file in there or search results coming from that repository.

Helps #4267

This is useful for confirming our behavior of highlighting timeouts, and can be used in the future to automate testing of this behavior as part of an e2e test. Primarily adding to reproduce https://github.com/sourcegraph/sourcegraph/issues/4267 Note: This matches other cases where e.g. github.com/sourcegraphtest/AlwaysCloningTest is a special repository for testing gitserver cloning states end-to-end. Helps #4267

A regression (which must've occurred awfully long ago, like around 8mo ago) means that today when syntax highlighting takes too long (which can happen for multiple reasons: due to very large files, bugs in the language grammar, or bugs in syntect itself) we are incorrectly just displaying the message and action: > Syntax-highlighting this file took too long. [Try again] And despite the fact that the server returned an unhighlighted version of the file, we're just discarding it instead of presenting it to the user as this system was originally designed / intended to work. What this means in practice is that today Sourcegraph is very unusable on large files because as soon as you view them you get the above error and have to hit `Try again` and wait a very long time. Instead, we now once again properly show the above error and action (for users who are willing to wait longer for syntax highlighting on these files) AND the actual plaintext version of the file. How it looks: ![image](https://user-images.githubusercontent.com/3173176/58941602-faf43880-8730-11e9-880f-f0f571fe3c63.png) Fixes #4267 See also #4364 which adds something that will allow us to e2e test this in the future.

codecov · 2019-06-05T08:28:54Z

Codecov Report

Merging #4364 into master will decrease coverage by 0.01%.
The diff coverage is 0%.

Impacted Files	Coverage Δ
pkg/highlight/highlight.go	`62.18% <0%> (-1.07%)`	⬇️
cmd/frontend/internal/pkg/discussions/format.go	`0% <0%> (ø)`	⬆️
cmd/frontend/graphqlbackend/file.go	`0% <0%> (ø)`	⬆️
cmd/repo-updater/repos/syncer.go	`78.53% <0%> (-2.1%)`	⬇️

felixfbecker

What is the current timeout? I am worried that this would not test whether e.g. a timeout error is correctly thrown and caught, and not causing an error response in reality. If the current timeout is not too long we could just e2e test it with an actual gigantic file

slimsag · 2019-06-05T08:37:47Z

You can see the current timeout right above the line in this PR where we are simulating it: https://github.com/sourcegraph/sourcegraph/pull/4364/files#diff-5a881b36b33270d5031b9a3060269516R49

Simulating with a large file is not a good idea because it would be prone to changing as syntect becomes faster and also would not cover the case of syntect hitting a bad language grammar definition.

felixfbecker · 2019-06-05T08:40:15Z

Can't we use a file that is so enormous it will for sure hit a timeout? I think 3s is very reasonable for an e2e test.

felixfbecker · 2019-06-05T08:40:27Z

How does a simulated timeout test a bad language grammar?

slimsag · 2019-06-05T08:50:17Z

There are multiple failure cases to test here:

A super large file that syntect can't handle. This will not result in a contextual timeout and would instead result instead in syntect rejecting the request which ends up as an HTTP status code error and hits a different error path (but with the same end result).
syntect_server being unresponsive in some way, e.g. due to it being overwhelmed due to many concurrent requests or hitting a bad language grammar edge case that syntect can't handle. This is the cause of Add tests to ensure we do not again have webapp regression: "Syntax-highlighting this file took too long." is not falling back to plaintext rendering mode #4267 and the only way to test this is via (1) a simulated timeout or by (2) finding one of those bad edge cases and testing that (but it can be fixed at any time making our test moot) or by (3) overwhelming syntect_server with requests which would be flaky.

My solution tests failure case #2 above and is the only viable way to test it. Your solution would not.

Your solution would be a good way to test the error case #1 above BUT that would be better done as a test inside of the syntect_server codebase (i.e. that it responds with the proper code when a file is too large) AND we haven't yet run into any issues stemming from that error case.

slimsag · 2019-06-05T08:51:57Z

Can't we use a file that is so enormous it will for sure hit a timeout? I think 3s is very reasonable for an e2e test.

No, this would cause syntect_server to reject the request instead of take a long time processing the request and we would hit a different error path here.

How does a simulated timeout test a bad language grammar?

It does so by simulating the fact that syntect_server takes a long time to respond (which is what occurs in the case of a bad language grammar).

felixfbecker · 2019-06-05T08:53:05Z

Good points.

My solution tests failure case #2 above and is the only viable way to test it.

Why is a unit test not viable, if the bug was in frontend logic?

slimsag · 2019-06-05T09:05:45Z

A unit test I think would be great to have in addition, but more generally I think an e2e test is suitable here because:

This didn't stem from "buggy" code from the perspective that the code was unstable/undetermistic or not well written per-say, but rather it appears the issue came from someone just not understanding what we actually want the end-user behavior to be it seems. With a unit test alone, I think we would've had the same issue because the person introducing this regression would've just also updated that test to use this oh-so-lovely new behavior of neglecting timeouts and treating them an error instead of some state we want to communicate.

When I send my 2nd PR in a bit you'll see we've messed this up twice in two completely independent locations and have outright neglected what potential meaning the aborted field means and how we want to handle it -- despite the docs existing around it in our GraphQL API. It also has quite bad negative side effects when we do get it wrong, so this all together leads me to believe an e2e test is in order.

This is a bit similar to not having our repository cloning experience be e2e tested, for example, and updating code to just say "error: repository is cloning try again later" calling it a day.

keegancsmith · 2019-06-05T09:10:45Z

pkg/highlight/highlight.go

 	if !disableTimeout {
 		var cancel func()
 		ctx, cancel = context.WithTimeout(ctx, 3*time.Second)
 		defer cancel()
 	}
+	if simulateTimeout {
+		time.Sleep(4 * time.Second)
+	}


time.Sleep is sadness. Is there no other way to simulate this?

No, there isn't. I know time.Sleep is ugly from first glance but it really is the right choice here. I considered adjusting the context timeout on its own and while that would work for this particular issue it would not raise the other UX issues here like the issue this helped me uncover where we apparently also no longer have any loading indicator for files that load slowly :/

felixfbecker · 2019-06-05T09:13:54Z

With a unit test alone, I think we would've had the same issue because the person introducing this regression would've just also updated that test to use this oh-so-lovely new behavior of neglecting timeouts and treating them an error instead of some state we want to communicate.

I don't think that's true if there is a test that's literally called "it should still render the unhighlighted blob if highlighting timed out". It's impossible to update that test with the new behavior without contradicting the test description - or put differently, it's just as likely as with an e2e test.

I'm cool with having an e2e test here but we should at least have a unit test. Unit tests are the bottom of the testing pyramid, e2e tests the top, everything that can be unit tested should be unit tested while e2e tests are optional and for important user flows.

… out after 3s Prior to this change there was a major regression in which we would disable the syntax highlighting timeout outright. This is despite the fact that this should only ever be done when the user has communicated an intent to wait longer for search results. This regression meant that when a search turned up files that the syntax highlighter could not highlight quickly, we would just wait for them until it finished. In one customer environment I observed search results taking 30s+ to load due to this because it had the added negative consequence of us not canceling those poorly performing requests which led to syntect_server trying to perform tons of CPU-intensive requests which harmed other requests it was handling. After this change, we are properly bailing out once again after the syntax highlighting timeout. This means that in this bad case you are waiting 3s then getting plaintext results instead of waiting 30s+ for highlighting. Fixes #4268 Test plan: Manually tested using #4364, in the future we will e2e test it.

slimsag · 2019-06-05T09:20:25Z

@felixfbecker sounds good, I think having both is needed honestly.

* blob: correct handling of syntax highlighting timeouts A regression (which must've occurred awfully long ago, like around 8mo ago) means that today when syntax highlighting takes too long (which can happen for multiple reasons: due to very large files, bugs in the language grammar, or bugs in syntect itself) we are incorrectly just displaying the message and action: > Syntax-highlighting this file took too long. [Try again] And despite the fact that the server returned an unhighlighted version of the file, we're just discarding it instead of presenting it to the user as this system was originally designed / intended to work. What this means in practice is that today Sourcegraph is very unusable on large files because as soon as you view them you get the above error and have to hit `Try again` and wait a very long time. Instead, we now once again properly show the above error and action (for users who are willing to wait longer for syntax highlighting on these files) AND the actual plaintext version of the file. How it looks: ![image](https://user-images.githubusercontent.com/3173176/58941602-faf43880-8730-11e9-880f-f0f571fe3c63.png) Fixes #4267 See also #4364 which adds something that will allow us to e2e test this in the future. * Update web/src/repo/blob/BlobPage.tsx Co-Authored-By: Felix Becker <[email protected]>

… out after 3s Prior to this change there was a major regression in which we would disable the syntax highlighting timeout outright. This is despite the fact that this should only ever be done when the user has communicated an intent to wait longer for search results. This regression meant that when a search turned up files that the syntax highlighter could not highlight quickly, we would just wait for them until it finished. In one customer environment I observed search results taking 30s+ to load due to this because it had the added negative consequence of us not canceling those poorly performing requests which led to syntect_server trying to perform tons of CPU-intensive requests which harmed other requests it was handling. After this change, we are properly bailing out once again after the syntax highlighting timeout. This means that in this bad case you are waiting 3s then getting plaintext results instead of waiting 30s+ for highlighting. Fixes #4268 Test plan: Manually tested using #4364, in the future we will e2e test it.

… out after 3s (#4369) * search: fix regression where search result highlighting is not bailed out after 3s Prior to this change there was a major regression in which we would disable the syntax highlighting timeout outright. This is despite the fact that this should only ever be done when the user has communicated an intent to wait longer for search results. This regression meant that when a search turned up files that the syntax highlighter could not highlight quickly, we would just wait for them until it finished. In one customer environment I observed search results taking 30s+ to load due to this because it had the added negative consequence of us not canceling those poorly performing requests which led to syntect_server trying to perform tons of CPU-intensive requests which harmed other requests it was handling. After this change, we are properly bailing out once again after the syntax highlighting timeout. This means that in this bad case you are waiting 3s then getting plaintext results instead of waiting 30s+ for highlighting. Fixes #4268 Test plan: Manually tested using #4364 & added a regression test for one unit

slimsag · 2019-07-02T20:57:04Z

@beyang please review

… out after 3s (#4369) * search: fix regression where search result highlighting is not bailed out after 3s Prior to this change there was a major regression in which we would disable the syntax highlighting timeout outright. This is despite the fact that this should only ever be done when the user has communicated an intent to wait longer for search results. This regression meant that when a search turned up files that the syntax highlighter could not highlight quickly, we would just wait for them until it finished. In one customer environment I observed search results taking 30s+ to load due to this because it had the added negative consequence of us not canceling those poorly performing requests which led to syntect_server trying to perform tons of CPU-intensive requests which harmed other requests it was handling. After this change, we are properly bailing out once again after the syntax highlighting timeout. This means that in this bad case you are waiting 3s then getting plaintext results instead of waiting 30s+ for highlighting. Fixes #4268 Test plan: Manually tested using #4364 & added a regression test for one unit

* blob: correct handling of syntax highlighting timeouts A regression (which must've occurred awfully long ago, like around 8mo ago) means that today when syntax highlighting takes too long (which can happen for multiple reasons: due to very large files, bugs in the language grammar, or bugs in syntect itself) we are incorrectly just displaying the message and action: > Syntax-highlighting this file took too long. [Try again] And despite the fact that the server returned an unhighlighted version of the file, we're just discarding it instead of presenting it to the user as this system was originally designed / intended to work. What this means in practice is that today Sourcegraph is very unusable on large files because as soon as you view them you get the above error and have to hit `Try again` and wait a very long time. Instead, we now once again properly show the above error and action (for users who are willing to wait longer for syntax highlighting on these files) AND the actual plaintext version of the file. How it looks: ![image](https://user-images.githubusercontent.com/3173176/58941602-faf43880-8730-11e9-880f-f0f571fe3c63.png) Fixes #4267 See also #4364 which adds something that will allow us to e2e test this in the future. * Update web/src/repo/blob/BlobPage.tsx Co-Authored-By: Felix Becker <[email protected]>

This is useful for confirming our behavior of highlighting timeouts, and can be used in the future to automate testing of this behavior as part of an e2e test. Primarily adding to reproduce https://github.com/sourcegraph/sourcegraph/issues/4267 Note: This matches other cases where e.g. github.com/sourcegraphtest/AlwaysCloningTest is a special repository for testing gitserver cloning states end-to-end. Helps #4267

slimsag requested review from keegancsmith and attfarhan June 5, 2019 08:14

felixfbecker reviewed Jun 5, 2019

View reviewed changes

keegancsmith reviewed Jun 5, 2019

View reviewed changes

slimsag mentioned this pull request Jun 5, 2019

search: fix regression where search result highlighting is not bailed out after 3s #4369

Merged

slimsag requested a review from beyang June 6, 2019 00:51

beyang approved these changes Jul 3, 2019

View reviewed changes

slimsag merged commit d05cb26 into master Jul 3, 2019

slimsag deleted the sg/simulate-highlight-timeouts branch July 3, 2019 18:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

highlighting: add support for simulating highlighting timeouts #4364

highlighting: add support for simulating highlighting timeouts #4364

slimsag commented Jun 5, 2019

codecov bot commented Jun 5, 2019

felixfbecker left a comment

slimsag commented Jun 5, 2019

felixfbecker commented Jun 5, 2019

felixfbecker commented Jun 5, 2019

slimsag commented Jun 5, 2019

slimsag commented Jun 5, 2019

felixfbecker commented Jun 5, 2019

slimsag commented Jun 5, 2019

keegancsmith Jun 5, 2019

slimsag Jun 5, 2019

felixfbecker commented Jun 5, 2019

slimsag commented Jun 5, 2019

slimsag commented Jul 2, 2019

highlighting: add support for simulating highlighting timeouts #4364

highlighting: add support for simulating highlighting timeouts #4364

Conversation

slimsag commented Jun 5, 2019

codecov bot commented Jun 5, 2019

Codecov Report

felixfbecker left a comment

Choose a reason for hiding this comment

slimsag commented Jun 5, 2019

felixfbecker commented Jun 5, 2019

felixfbecker commented Jun 5, 2019

slimsag commented Jun 5, 2019

slimsag commented Jun 5, 2019

felixfbecker commented Jun 5, 2019

slimsag commented Jun 5, 2019

keegancsmith Jun 5, 2019

Choose a reason for hiding this comment

slimsag Jun 5, 2019

Choose a reason for hiding this comment

felixfbecker commented Jun 5, 2019

slimsag commented Jun 5, 2019

slimsag commented Jul 2, 2019