Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submissions randomly go missing #194

Open
j-mao opened this issue Jan 8, 2020 · 12 comments
Open

Submissions randomly go missing #194

j-mao opened this issue Jan 8, 2020 · 12 comments
Labels
backend Related to competition website backend. bug Something isn't working low-priority

Comments

@j-mao
Copy link
Member

j-mao commented Jan 8, 2020

For some reason, source code for submissions 316, 318, 325, 328 don't exist, but they are still in the database and cause trouble for the servers. Should investigate why these submissions disappeared.

@j-mao j-mao added bug Something isn't working backend Related to competition website backend. critical labels Jan 8, 2020
j-mao added a commit that referenced this issue Jan 8, 2020
Temporarily alleviates #194 by not hanging while waiting for the file to
appear (it had entered race condition avoidance which blocked everything)

Also need to do something about HTTP 400 on compilation_update because
we don't want to recycle jobs that "fail" because of a resubmit.
@n8kim1
Copy link
Member

n8kim1 commented Jan 8, 2020

is this not the race condition? (i'll keep digging)

@n8kim1
Copy link
Member

n8kim1 commented Jan 8, 2020

thinking abt it, it might be good to separate the bucket part and pub/sub part of compilation code into two try/catch blocks

@j-mao
Copy link
Member Author

j-mao commented Jan 8, 2020

I think our race condition had been "fixed" infrastructure-side when this happened. Even so I think the backend should be agnostic to the race condition and it affects the compile server more?

I hypothesise that this could be something to do with submitting twice in a row; potentially this is another race condition inside the backend/db. e.g. 484 is missing and submitted by the same team as 482-483.

@j-mao
Copy link
Member Author

j-mao commented Jan 9, 2020

This is causing many problems with the compilation queue and needs to be fixed as soon as possible.

j-mao added a commit that referenced this issue Jan 9, 2020
Temporarily alleviates #194 by not allowing the queue to explode in size
while jobs are continually recycled into it.
j-mao added a commit that referenced this issue Jan 9, 2020
Revokes f31504d.

This sketchy giving up is needed while #194 is open. Once it is closed,
it should be removed.
@j-mao
Copy link
Member Author

j-mao commented Jan 9, 2020

Cause appears to be HTTP 500 on /api/0/submission; the row is added to the database, but no signed upload url is given to the browser and so the code never reaches the bucket.
Further testing indicates that the job also never reaches the compile server

@j-mao
Copy link
Member Author

j-mao commented Jan 9, 2020

A sketchy but effective fix:

  • While making the first submit, disable the submit button to prevent a second submit from starting before the first one finishes.
  • If it 500s, send the request again. Can maybe even ask the backend to delete/reuse the previous request but this is probably less important.
  • If it 201s, alert success, or do something that makes it physically impossible to submit again immediately. Also re-enable the submit button. This would also resolve Disable the submit button right after it's clicked #181

@n8kim1
Copy link
Member

n8kim1 commented Jan 9, 2020

justify / display reason of "can't submit while prev submission processing" (as opposed to compiling, etc)

@n8kim1
Copy link
Member

n8kim1 commented Jan 9, 2020

@zoemarschner @arvid220u let's make the sketchy fix for now; a more robust solution is better in long term and probably isn't difficult to carry out

@j-mao
Copy link
Member Author

j-mao commented Jan 9, 2020

Further testing indicates that the job also never reaches the compile server

Looking at server logs from last night, sometimes it actually still does reach the pub/sub and compile server, so there may be more than one issue.

We will likely need a more robust solution to deal with this one, because the sketchy patch doesn't save the compile server from being loaded with non-existent submissions.

@j-mao
Copy link
Member Author

j-mao commented Jan 12, 2020

Compile server is configured to retry things that are bad and ignore things that are really bad. Not a big problem right now anymore

@n8kim1
Copy link
Member

n8kim1 commented Jan 20, 2020

@j-mao I saw some more submissions that go missing (unfortunately forget the IDs now); these were also a bunch in a row by the same team, w/ only the last one making it

this may be an issue worth looking into more

@n8kim1
Copy link
Member

n8kim1 commented Jan 20, 2020

eg submissions 6250-6258 don't have data, w only 6259 having a bucket; these were all submissions by the same team, all within a couple seconds of each other

Is this behavior intended? and even if not, is it acceptable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Related to competition website backend. bug Something isn't working low-priority
Projects
None yet
Development

No branches or pull requests

2 participants