Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

378 update submission processing to use polars data validator #394

Merged

Conversation

jcadam14
Copy link
Contributor

@jcadam14 jcadam14 commented Sep 5, 2024

Closes #378

Changes the filing-api in the following ways:

  • When the content is uploaded, use python csv to get the row count since chunking generators in data-validator no longer provide final counts of rows in file
  • Use generator from validator to get findings dataframe, and collate into final dataframe
  • Updated to generate a file path that's either local or s3://, which the data-validator can use with fsspec to correctly use either way
  • Calls new df_to_download to write out s3, the data validator handles the writing
  • Updated build_validation_results to use the collated findings dataframe to build out the json for the frontend, ValidationResults are no longer used in the filing-api
  • Updated the pytests
  • Updated boto3 to stick to version 1.34 since fsspec and s3fs (which is used in the data-validator) don't currently support botocore 1.35. When the data-validator is separated, we can switch back to the latest boto3

@jcadam14 jcadam14 self-assigned this Sep 5, 2024
@jcadam14 jcadam14 linked an issue Sep 5, 2024 that may be closed by this pull request
Copy link

github-actions bot commented Sep 5, 2024

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  src/sbl_filing_api
  config.py
  src/sbl_filing_api/routers
  filing.py
  src/sbl_filing_api/services
  submission_processor.py 64
Project Total  

This report was generated by python-coverage-comment-action

@jcadam14 jcadam14 requested a review from lchen-2101 November 7, 2024 19:42
@jcadam14
Copy link
Contributor Author

jcadam14 commented Nov 7, 2024

This was tested on the devpub cluster as well as pytests and docker-compose. I had to merge this with the counters PR (#473) since the database and all that had been updated already with that work. So on devpub, it's a combo of these two. But once one of them is approved I'll do the work to merge them in whatever other PR is left open

Comment on lines 118 to 119
except RuntimeError as re:
log.exception("The file is malformed.", re)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be needed, log.exception contains exc_info=True, which extracts the exception from the stack.

pyproject.toml Outdated Show resolved Hide resolved
@jcadam14 jcadam14 requested a review from lchen-2101 November 8, 2024 15:27
lchen-2101
lchen-2101 previously approved these changes Nov 13, 2024
Copy link
Collaborator

@lchen-2101 lchen-2101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lchen-2101
Copy link
Collaborator

oop, conflict

@jcadam14 jcadam14 requested a review from lchen-2101 November 13, 2024 23:04
@lchen-2101
Copy link
Collaborator

should we update the poetry lock file with the polars update, and the fix?

Copy link
Collaborator

@lchen-2101 lchen-2101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jcadam14 jcadam14 merged commit 746ba20 into main Nov 14, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update submission processing to use polars data-validator
2 participants