Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix/unique-temp-filenames #225

Merged

Conversation

chrisjkhan
Copy link
Contributor

@chrisjkhan chrisjkhan commented Dec 28, 2022

This pull request resolves #226

Use content hash for temp video file naming to prevent collisions across

@codecov
Copy link

codecov bot commented Dec 28, 2022

Codecov Report

Merging #225 (4cdb0ab) into main (b00f378) will increase coverage by 0.28%.
The diff coverage is 95.91%.

@@            Coverage Diff             @@
##             main     #225      +/-   ##
==========================================
+ Coverage   72.40%   72.68%   +0.28%     
==========================================
  Files          64       64              
  Lines        3541     3581      +40     
==========================================
+ Hits         2564     2603      +39     
- Misses        977      978       +1     
Impacted Files Coverage Δ
...ckend/tests/pipeline/test_event_gather_pipeline.py 99.17% <88.88%> (-0.83%) ⬇️
cdp_backend/utils/file_utils.py 87.94% <92.30%> (+0.68%) ⬆️
cdp_backend/pipeline/event_gather_pipeline.py 84.74% <100.00%> (+0.03%) ⬆️
cdp_backend/tests/utils/test_file_utils.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Contributor

@dphoria dphoria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Chris! Just one question.

cdp_backend/pipeline/event_gather_pipeline.py Outdated Show resolved Hide resolved
Copy link
Member

@evamaxfield evamaxfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this all looks good to me with two caveats:

  1. the thing @dphoria already mentioned about pulling in the correct extension
  2. a bit more complicated, one of the things that is decently important is that the produced files related to a session are all uploaded with deterministic names. I think a prior version of this PR actually used the session_content_hash which I think is the correct thing to use instead of a uuid. The reasoning here is that when it comes time to pulling down audio files from the database (which is never done for the app side of things but is done in my research) we need a deterministic name. If all of that made sense, the question is: after these changes, do the files that are ultimately stored to the file storage system / Google Cloud Storage have deterministic naming?

cdp_backend/tests/utils/test_file_utils.py Show resolved Hide resolved
Copy link
Member

@evamaxfield evamaxfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the correct extension gets passed through. Then all looks good. Thanks @chrisjkhan for checking the determinism aspect ❤️

Will merge once I see the extension work go through and tests pass.

Copy link
Member

@evamaxfield evamaxfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! @dphoria can you also check this. Will merge soon!

Copy link
Contributor

@dphoria dphoria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship it~ 🚀

@chrisjkhan
Copy link
Contributor Author

Looks good to me! @dphoria can you also check this. Will merge soon!

@evamaxfield actually, it's not working right now. I was wanting to reach out to see if you knew what might be wrong. The temporary video file exists in the context of the convert_video_and_handle_host function, after the resource_copy_task, but it does not exist in the context of the file_utils.clip_and_reformat_video function, with no other commands called in between. I'm starting to think it's something small, like a typo. I thought maybe it had to do with the Prefect architecture, but I don't think so now. It is essentially the same architectural form as before any changes were made for trimming, so it seems like it must be a minor issue.

File exists here:


But not here:
log.info(f"File exists: {video_filepath.exists()}")

@dphoria let me know if you have any ideas

@dphoria
Copy link
Contributor

dphoria commented Dec 31, 2022

🤣 I have failed as a reviewer. Anyway, good catch Chris. I'm preoccupied with some other stuff today; will try to debug when I can later today. Hopefully you'll figure it out soon. 😅

@evamaxfield
Copy link
Member

I will also try to take a look asap

@chrisjkhan
Copy link
Contributor Author

@evamaxfield
Copy link
Member

Thanks for making all the changes @chrisjkhan!

Merging and releasing new version. Upgrade should rollout in ~2 hours.

@evamaxfield evamaxfield merged commit aa8ff15 into CouncilDataProject:main Jan 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants