Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

131 look into setting time limit on background validation task #159

Merged

Conversation

jcadam14
Copy link
Contributor

@jcadam14 jcadam14 commented Apr 12, 2024

Closes #131

  • Added fastapil-utilities, which includes an async @retry_every decorator.
  • Added env vars to set the interval we check expired submissions, and the time diff from time now to check if something is considered expired
  • Added repo function to query over SUBMISSION_STARTED, SUBMISSION_UPLOADED, and VALIDATION_IN_PROGRESS and check the submission_time to time now, and set to VALIDATION_EXPIRED if greater
    • I used submission_time, which I think is going to be moved under submitter, since the times between START, UPLOADED and IN_PROGRESS are relatively tiny. We don't capture or update a timestamp based on state transitions so I figured for this, it was 'good enough'.
  • Added new state to enum and alembic script for db update
  • Added pytests

@jcadam14 jcadam14 self-assigned this Apr 12, 2024
@jcadam14 jcadam14 linked an issue Apr 12, 2024 that may be closed by this pull request
Copy link

github-actions bot commented Apr 12, 2024

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  src/sbl_filing_api
  config.py
  src/sbl_filing_api/entities/models
  model_enums.py
  src/sbl_filing_api/entities/repos
  submission_repo.py
  src/sbl_filing_api/routers
  filing.py
  src/sbl_filing_api/services
  submission_processor.py
Project Total  

This report was generated by python-coverage-comment-action

Comment on lines 48 to 50
@repeat_every(seconds=settings.expired_submission_check_secs, wait_first=True, logger=log)
async def check_expired_submissions() -> None:
await submission_repo.check_expired_submissions()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for MVP, this is fine; but in the grand scheme of things, this should probably be part of another non-api service that deals with collector type of tasks.
food for thought, another approach we might be able to do is with task timeout wrapper, something like

async with asyncio.timeout_at(asyncio.get_running_loop().time() + timeout_in_seconds):
    ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is definitely purely an MVP implementation to handle the 'just in case' situations where an exception or scenario we didn't think about creeps up and causes the frontend to continually spin because the latest submission never gets out of the in progress state.

I went with the single task polling the db route instead of a thread for each submission that expires that particular one ( assuming that is what you're alluding to there). I've done both in the past and had less headaches with a single periodic updater. I'm good to discuss more before squashing and merging. But yes, this is a temporary solution to an issue that more robust error handling recently implemented will probably take care of but gives the frontend warm fuzzies ;)

lchen-2101
lchen-2101 previously approved these changes Apr 15, 2024
Copy link
Collaborator

@lchen-2101 lchen-2101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for MVP, should revisit post-MVP

Copy link
Member

@hkeeler hkeeler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking this would be something more at the task level that would monitor how long the process has been running. I do like this approach, though. My only question is, how's this going to behave if we have multiple instances for the container running at the same time? They'll each be polling the database at the same frequency. Chances are they'll frequently try to each set it to VALIDATION_EXPIRED at the same time. Any potential issues with that?

I'm thinking back to a past life where we had many clients polling the same table, and sometimes updated it as well, and we'd get database deadlocks in certain scenarios, until we restructured the code.

In the long run, if we want/need to keep a process like this around, we may want to run it as a whole separate process...but not now if we don't have to. 😄

stmt = select(SubmissionDAO).filter(SubmissionDAO.state.in_(check_states))
submissions = (await session.scalars(stmt)).all()
for s in submissions:
if abs(s.submission_time.timestamp() - datetime.now().timestamp()) > settings.expired_submission_diff_secs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the abs? Can we just flip what's subtracted from what?

Suggested change
if abs(s.submission_time.timestamp() - datetime.now().timestamp()) > settings.expired_submission_diff_secs:
if datetime.now().timestamp() - s.submission_time.timestamp() > settings.expired_submission_diff_secs:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can we add a WARN-level log statement in here when this happens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup to both.

Also yeah definitely, wasn’t considering multiple pods. What I’ve done in the past is a single stateful set pod (you can singleton em) that handle this sort of thing. It would just be the scheduled task and would call an endpoint on the service (which could then be handled by any of the api pods)

There wasn’t any sort of timeout I could find to set on the background task itself. However talking with ye olde chat we could use a mixture of the background task, a wait_for task with timeout and gather them. The wait task timeout error could check, and then set, the sub state if it’s not in a good validation state. Problem with that approach is it would only handle situations where the sub actually entered into the validation flow. If someone failed outside of that (during S3 for example) that we for some reason don’t handle gracefully prior to the background task creation, we’d miss setting the expired state. Just a consideration to tying it to the background task

@jcadam14
Copy link
Contributor Author

Still putting in the try/catch around S3 and all that, just wanted to get my start of the rewrite pushed

src/.env.local Outdated
FS_DOWNLOAD_CONFIG__PROTOCOL="file"
EXPIRED_SUBMISSION_CHECK_SECS=60
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one isn't needed anymore, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope thank you, removing

pyproject.toml Outdated
@@ -23,6 +23,7 @@ async-lru = "^2.0.4"
fsspec = "^2024.2.0"
s3fs = "^2024.2.0"
httpx = "^0.26.0"
fastapi-utilities = "^0.2.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, one more, 😂

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and probably need to update the lock file as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curses! removed

Copy link
Collaborator

@lchen-2101 lchen-2101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lchen-2101 lchen-2101 merged commit 2b1884a into main Apr 18, 2024
3 checks passed
@lchen-2101 lchen-2101 deleted the 131-look-into-setting-time-limit-on-background-validation-task branch April 18, 2024 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Look into setting time limit on background validation task
3 participants