Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

highlighting: add support for simulating highlighting timeouts #4364

Merged
merged 1 commit into from
Jul 3, 2019

Conversation

slimsag
Copy link
Member

@slimsag slimsag commented Jun 5, 2019

This is useful for confirming our behavior of highlighting timeouts, and can be
used in the future to automate testing of this behavior as part of an e2e test.

Primarily adding to reproduce https://github.com/sourcegraph/sourcegraph/issues/4267

Note: This matches other cases where e.g. github.com/sourcegraphtest/AlwaysCloningTest
is a special repository for testing gitserver cloning states end-to-end.

To use this you just need to add github.com/sourcegraph/AlwaysHighlightTimeoutTest to your instance and view a file in there or search results coming from that repository.

Helps #4267

This is useful for confirming our behavior of highlighting timeouts, and can be
used in the future to automate testing of this behavior as part of an e2e test.

Primarily adding to reproduce https://github.com/sourcegraph/sourcegraph/issues/4267

Note: This matches other cases where e.g. github.com/sourcegraphtest/AlwaysCloningTest
is a special repository for testing gitserver cloning states end-to-end.

Helps #4267
slimsag added a commit that referenced this pull request Jun 5, 2019
A regression (which must've occurred awfully long ago, like around 8mo ago)
means that today when syntax highlighting takes too long (which can happen for
multiple reasons: due to very large files, bugs in the language grammar, or
bugs in syntect itself) we are incorrectly just displaying the message and action:

> Syntax-highlighting this file took too long. [Try again]

And despite the fact that the server returned an unhighlighted version of the
file, we're just discarding it instead of presenting it to the user as this
system was originally designed / intended to work.

What this means in practice is that today Sourcegraph is very unusable on large
files because as soon as you view them you get the above error and have to hit
`Try again` and wait a very long time.

Instead, we now once again properly show the above error and action (for users
who are willing to wait longer for syntax highlighting on these files) AND the
actual plaintext version of the file.

How it looks:

![image](https://user-images.githubusercontent.com/3173176/58941602-faf43880-8730-11e9-880f-f0f571fe3c63.png)

Fixes #4267

See also #4364 which adds something that will allow us to e2e test this in the future.
@codecov
Copy link

codecov bot commented Jun 5, 2019

Codecov Report

Merging #4364 into master will decrease coverage by 0.01%.
The diff coverage is 0%.

Impacted Files Coverage Δ
pkg/highlight/highlight.go 62.18% <0%> (-1.07%) ⬇️
cmd/frontend/internal/pkg/discussions/format.go 0% <0%> (ø) ⬆️
cmd/frontend/graphqlbackend/file.go 0% <0%> (ø) ⬆️
cmd/repo-updater/repos/syncer.go 78.53% <0%> (-2.1%) ⬇️

Copy link
Contributor

@felixfbecker felixfbecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the current timeout? I am worried that this would not test whether e.g. a timeout error is correctly thrown and caught, and not causing an error response in reality. If the current timeout is not too long we could just e2e test it with an actual gigantic file

@slimsag
Copy link
Member Author

slimsag commented Jun 5, 2019

You can see the current timeout right above the line in this PR where we are simulating it: https://github.com/sourcegraph/sourcegraph/pull/4364/files#diff-5a881b36b33270d5031b9a3060269516R49

Simulating with a large file is not a good idea because it would be prone to changing as syntect becomes faster and also would not cover the case of syntect hitting a bad language grammar definition.

@felixfbecker
Copy link
Contributor

Can't we use a file that is so enormous it will for sure hit a timeout? I think 3s is very reasonable for an e2e test.

@felixfbecker
Copy link
Contributor

How does a simulated timeout test a bad language grammar?

@slimsag
Copy link
Member Author

slimsag commented Jun 5, 2019

There are multiple failure cases to test here:

  1. A super large file that syntect can't handle. This will not result in a contextual timeout and would instead result instead in syntect rejecting the request which ends up as an HTTP status code error and hits a different error path (but with the same end result).
  2. syntect_server being unresponsive in some way, e.g. due to it being overwhelmed due to many concurrent requests or hitting a bad language grammar edge case that syntect can't handle. This is the cause of Add tests to ensure we do not again have webapp regression: "Syntax-highlighting this file took too long." is not falling back to plaintext rendering mode #4267 and the only way to test this is via (1) a simulated timeout or by (2) finding one of those bad edge cases and testing that (but it can be fixed at any time making our test moot) or by (3) overwhelming syntect_server with requests which would be flaky.

My solution tests failure case #2 above and is the only viable way to test it. Your solution would not.

Your solution would be a good way to test the error case #1 above BUT that would be better done as a test inside of the syntect_server codebase (i.e. that it responds with the proper code when a file is too large) AND we haven't yet run into any issues stemming from that error case.

@slimsag
Copy link
Member Author

slimsag commented Jun 5, 2019

Can't we use a file that is so enormous it will for sure hit a timeout? I think 3s is very reasonable for an e2e test.

No, this would cause syntect_server to reject the request instead of take a long time processing the request and we would hit a different error path here.

How does a simulated timeout test a bad language grammar?

It does so by simulating the fact that syntect_server takes a long time to respond (which is what occurs in the case of a bad language grammar).

@felixfbecker
Copy link
Contributor

Good points.

My solution tests failure case #2 above and is the only viable way to test it.

Why is a unit test not viable, if the bug was in frontend logic?

@slimsag
Copy link
Member Author

slimsag commented Jun 5, 2019

A unit test I think would be great to have in addition, but more generally I think an e2e test is suitable here because:

This didn't stem from "buggy" code from the perspective that the code was unstable/undetermistic or not well written per-say, but rather it appears the issue came from someone just not understanding what we actually want the end-user behavior to be it seems. With a unit test alone, I think we would've had the same issue because the person introducing this regression would've just also updated that test to use this oh-so-lovely new behavior of neglecting timeouts and treating them an error instead of some state we want to communicate.

When I send my 2nd PR in a bit you'll see we've messed this up twice in two completely independent locations and have outright neglected what potential meaning the aborted field means and how we want to handle it -- despite the docs existing around it in our GraphQL API. It also has quite bad negative side effects when we do get it wrong, so this all together leads me to believe an e2e test is in order.

This is a bit similar to not having our repository cloning experience be e2e tested, for example, and updating code to just say "error: repository is cloning try again later" calling it a day.

if !disableTimeout {
var cancel func()
ctx, cancel = context.WithTimeout(ctx, 3*time.Second)
defer cancel()
}
if simulateTimeout {
time.Sleep(4 * time.Second)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time.Sleep is sadness. Is there no other way to simulate this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there isn't. I know time.Sleep is ugly from first glance but it really is the right choice here. I considered adjusting the context timeout on its own and while that would work for this particular issue it would not raise the other UX issues here like the issue this helped me uncover where we apparently also no longer have any loading indicator for files that load slowly :/

@felixfbecker
Copy link
Contributor

With a unit test alone, I think we would've had the same issue because the person introducing this regression would've just also updated that test to use this oh-so-lovely new behavior of neglecting timeouts and treating them an error instead of some state we want to communicate.

I don't think that's true if there is a test that's literally called "it should still render the unhighlighted blob if highlighting timed out". It's impossible to update that test with the new behavior without contradicting the test description - or put differently, it's just as likely as with an e2e test.

I'm cool with having an e2e test here but we should at least have a unit test. Unit tests are the bottom of the testing pyramid, e2e tests the top, everything that can be unit tested should be unit tested while e2e tests are optional and for important user flows.

slimsag added a commit that referenced this pull request Jun 5, 2019
… out after 3s

Prior to this change there was a major regression in which we would disable the
syntax highlighting timeout outright. This is despite the fact that this should
only ever be done when the user has communicated an intent to wait longer for
search results.

This regression meant that when a search turned up files that the syntax highlighter could
not highlight quickly, we would just wait for them until it finished. In one customer
environment I observed search results taking 30s+ to load due to this because it had the
added negative consequence of us not canceling those poorly performing requests which led to
syntect_server trying to perform tons of CPU-intensive requests which harmed other requests
it was handling.

After this change, we are properly bailing out once again after the syntax highlighting
timeout.

This means that in this bad case you are waiting 3s then getting plaintext results instead of
waiting 30s+ for highlighting.

Fixes #4268

Test plan: Manually tested using #4364, in the future we will e2e test it.
@slimsag
Copy link
Member Author

slimsag commented Jun 5, 2019

@felixfbecker sounds good, I think having both is needed honestly.

slimsag added a commit that referenced this pull request Jun 5, 2019
* blob: correct handling of syntax highlighting timeouts

A regression (which must've occurred awfully long ago, like around 8mo ago)
means that today when syntax highlighting takes too long (which can happen for
multiple reasons: due to very large files, bugs in the language grammar, or
bugs in syntect itself) we are incorrectly just displaying the message and action:

> Syntax-highlighting this file took too long. [Try again]

And despite the fact that the server returned an unhighlighted version of the
file, we're just discarding it instead of presenting it to the user as this
system was originally designed / intended to work.

What this means in practice is that today Sourcegraph is very unusable on large
files because as soon as you view them you get the above error and have to hit
`Try again` and wait a very long time.

Instead, we now once again properly show the above error and action (for users
who are willing to wait longer for syntax highlighting on these files) AND the
actual plaintext version of the file.

How it looks:

![image](https://user-images.githubusercontent.com/3173176/58941602-faf43880-8730-11e9-880f-f0f571fe3c63.png)

Fixes #4267

See also #4364 which adds something that will allow us to e2e test this in the future.

* Update web/src/repo/blob/BlobPage.tsx

Co-Authored-By: Felix Becker <[email protected]>
@slimsag slimsag requested a review from beyang June 6, 2019 00:51
slimsag added a commit that referenced this pull request Jun 6, 2019
… out after 3s

Prior to this change there was a major regression in which we would disable the
syntax highlighting timeout outright. This is despite the fact that this should
only ever be done when the user has communicated an intent to wait longer for
search results.

This regression meant that when a search turned up files that the syntax highlighter could
not highlight quickly, we would just wait for them until it finished. In one customer
environment I observed search results taking 30s+ to load due to this because it had the
added negative consequence of us not canceling those poorly performing requests which led to
syntect_server trying to perform tons of CPU-intensive requests which harmed other requests
it was handling.

After this change, we are properly bailing out once again after the syntax highlighting
timeout.

This means that in this bad case you are waiting 3s then getting plaintext results instead of
waiting 30s+ for highlighting.

Fixes #4268

Test plan: Manually tested using #4364, in the future we will e2e test it.
slimsag added a commit that referenced this pull request Jun 6, 2019
… out after 3s (#4369)

* search: fix regression where search result highlighting is not bailed out after 3s

Prior to this change there was a major regression in which we would disable the
syntax highlighting timeout outright. This is despite the fact that this should
only ever be done when the user has communicated an intent to wait longer for
search results.

This regression meant that when a search turned up files that the syntax highlighter could
not highlight quickly, we would just wait for them until it finished. In one customer
environment I observed search results taking 30s+ to load due to this because it had the
added negative consequence of us not canceling those poorly performing requests which led to
syntect_server trying to perform tons of CPU-intensive requests which harmed other requests
it was handling.

After this change, we are properly bailing out once again after the syntax highlighting
timeout.

This means that in this bad case you are waiting 3s then getting plaintext results instead of
waiting 30s+ for highlighting.

Fixes #4268

Test plan: Manually tested using #4364 & added a regression test for one unit
@slimsag
Copy link
Member Author

slimsag commented Jul 2, 2019

@beyang please review

slimsag added a commit that referenced this pull request Jul 2, 2019
… out after 3s (#4369)

* search: fix regression where search result highlighting is not bailed out after 3s

Prior to this change there was a major regression in which we would disable the
syntax highlighting timeout outright. This is despite the fact that this should
only ever be done when the user has communicated an intent to wait longer for
search results.

This regression meant that when a search turned up files that the syntax highlighter could
not highlight quickly, we would just wait for them until it finished. In one customer
environment I observed search results taking 30s+ to load due to this because it had the
added negative consequence of us not canceling those poorly performing requests which led to
syntect_server trying to perform tons of CPU-intensive requests which harmed other requests
it was handling.

After this change, we are properly bailing out once again after the syntax highlighting
timeout.

This means that in this bad case you are waiting 3s then getting plaintext results instead of
waiting 30s+ for highlighting.

Fixes #4268

Test plan: Manually tested using #4364 & added a regression test for one unit
slimsag added a commit that referenced this pull request Jul 2, 2019
* blob: correct handling of syntax highlighting timeouts

A regression (which must've occurred awfully long ago, like around 8mo ago)
means that today when syntax highlighting takes too long (which can happen for
multiple reasons: due to very large files, bugs in the language grammar, or
bugs in syntect itself) we are incorrectly just displaying the message and action:

> Syntax-highlighting this file took too long. [Try again]

And despite the fact that the server returned an unhighlighted version of the
file, we're just discarding it instead of presenting it to the user as this
system was originally designed / intended to work.

What this means in practice is that today Sourcegraph is very unusable on large
files because as soon as you view them you get the above error and have to hit
`Try again` and wait a very long time.

Instead, we now once again properly show the above error and action (for users
who are willing to wait longer for syntax highlighting on these files) AND the
actual plaintext version of the file.

How it looks:

![image](https://user-images.githubusercontent.com/3173176/58941602-faf43880-8730-11e9-880f-f0f571fe3c63.png)

Fixes #4267

See also #4364 which adds something that will allow us to e2e test this in the future.

* Update web/src/repo/blob/BlobPage.tsx

Co-Authored-By: Felix Becker <[email protected]>
@slimsag slimsag merged commit d05cb26 into master Jul 3, 2019
@slimsag slimsag deleted the sg/simulate-highlight-timeouts branch July 3, 2019 18:00
slimsag added a commit that referenced this pull request Jul 13, 2019
This is useful for confirming our behavior of highlighting timeouts, and can be
used in the future to automate testing of this behavior as part of an e2e test.

Primarily adding to reproduce https://github.com/sourcegraph/sourcegraph/issues/4267

Note: This matches other cases where e.g. github.com/sourcegraphtest/AlwaysCloningTest
is a special repository for testing gitserver cloning states end-to-end.

Helps #4267
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants