Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle EvidenceSubmissionWindowTasks with last_submitted_at edge cases #15245

Closed
2 tasks
hschallhorn opened this issue Sep 15, 2020 · 1 comment · Fixed by #15598
Closed
2 tasks

Handle EvidenceSubmissionWindowTasks with last_submitted_at edge cases #15245

hschallhorn opened this issue Sep 15, 2020 · 1 comment · Fixed by #15598
Assignees
Labels
Feature: generic-queue Priority: Medium Blocking issue w/workaround, or "second in" priority for new work. Product: caseflow-queue Stakeholder: BVA Functionality associated with the Board of Veterans' Appeals workflows/feature requests Team: Echo 🐬 Type: Tech-Improvement

Comments

@hschallhorn
Copy link
Contributor

hschallhorn commented Sep 15, 2020

Description

On occasion, we receive an alert that there are task timers that should have been completed but have not been processed. Sometimes this is due to the fact that the "when a task timer should be completed" timestamp is calculated to be before the task is actually created. If this date difference is greater than 4 days, our task timer job will ignore the task timer and not attempt to complete it. See investigation here for details. TL/DR, when the task timer job looks for task timer to handle, it uses the "unexpired" scope, which filters out any task that should have been completed over 4 days ago.

To remedy this, upon creation, we check to see if the task completion date is outside of the 4 day window, and immediately reset the "should be completed" timestamp so it can be picked up by the next job.

The issue lies in a very tight edge case where a task is created with a last_submitted_at timestamp within the 4 day window when it is created, but by the time the next job is run, the timestamp has slipped outside of the window. When last investigating this, we saw 10 of these edge cases occur in less than a month, some missing the cutoff by only 4 minutes

AC

  • Come up with the best solution to handle this edge case
  • Implement solution
@hschallhorn hschallhorn added Type: Tech-Improvement Product: caseflow-queue Feature: generic-queue Stakeholder: BVA Functionality associated with the Board of Veterans' Appeals workflows/feature requests Team: Echo 🐬 Priority: Medium Blocking issue w/workaround, or "second in" priority for new work. labels Sep 15, 2020
@ajspotts
Copy link
Contributor

ajspotts commented Sep 17, 2020

what is this chart?

1 | 
2 | |||||
3 | |||||||
5 | 
8 | 

Job runs every hour.

Why 1?

Why 2?

  • Scoped to small part of the code

Why 3?

  • Testing to reproduce edge case
  • Identifying the best way to handle this
  • Job and async are pretty convoluted, may take time to spin up

@hschallhorn hschallhorn self-assigned this Nov 10, 2020
va-bot pushed a commit that referenced this issue Nov 23, 2020
Resolves #15245 and hopefully squash all those "incomplete and pending task timer" alerts

### Description
**The problem**: Timers can be created with a date in the past, wild!

**Current solution**: Our "complete the timers" job that runs every hour will look for timers that [should have been completed any time in the last four days](https://github.com/department-of-veterans-affairs/caseflow/blob/7c3d16319d46c412d1869b118199f428306eafa1/app/models/concerns/asyncable.rb#L104-L105)!

**"But what about timers that should have ended _over_ four days ago?" you might ask**: When we create a timer that should have been completed over 4 days ago, we [immediately reset the timer](https://github.com/department-of-veterans-affairs/caseflow/blob/2eb2d81dfe1efc25b1afe8ec009f7ed613595bb8/app/models/concerns/timeable_task.rb#L18) to be picked up by the job on its next run!

**"So what is the problem?" you may then ask**: On occasion, timers are created that should have been completed 3.79 days ago, so they are not caught in the initial check. But by the time the job runs (could be up to an hour later), they are no longer inside of that 4 day window and are not completed by the job.

This PR adds a 1 hour buffer to that initial check so we take into account tasks that will slip out of the 4 day window by the next time the job is run.

### Acceptance Criteria
- [ ] Task timers that should have expired 3.99ish days ago will be caught by our [initial "reset this timer" logic upon creation](https://github.com/department-of-veterans-affairs/caseflow/blob/2eb2d81dfe1efc25b1afe8ec009f7ed613595bb8/app/models/concerns/timeable_task.rb#L18).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: generic-queue Priority: Medium Blocking issue w/workaround, or "second in" priority for new work. Product: caseflow-queue Stakeholder: BVA Functionality associated with the Board of Veterans' Appeals workflows/feature requests Team: Echo 🐬 Type: Tech-Improvement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants