Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

level 4 limits #662

Merged
merged 2 commits into from
Oct 11, 2023
Merged

level 4 limits #662

merged 2 commits into from
Oct 11, 2023

Conversation

bloodearnest
Copy link
Member

  • Use correct privacy strings in tests
  • Limit level 4 file size

The only actual value that matters is `moderately_senstive`, which was
what valid project.yaml's will have. Everything else is effectively
`highly_senstive`. But hand written data in the tests was wrong, and
might be confusing.
This adds a global config, `config.MAX_LEVEL4_FILESIZE` (which defaults
to 16mb) which limits the size of files that will be copied to level
4 storage.

The goal here is to prevent accidental copying of files to level 4 that
should not be. Level 4 files should be aggregate data, and potentially
reviewable/releaseable by output checkers. Datasets including
pseudonymized patient level data should not be marked as
`moderately_senstitive`, and this is one thing to prevent them being
done so by mistake.

When triggered, it communciates the fact that files have not been
copied to users in two ways:

1) It includes the list of files not cpied in the job satus_message,
   which will be visible at jobs.opensafely.org

2) It writes file with the same name plus `.txt` file with a message, so
   it discoverable in level 4. It deletes this file if files are
   successfully copied.

Fixes #298
Copy link
Contributor

@evansd evansd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. In particular, I like the fact that the max size is passed on the JobDefinition so we have the option to customise on a per-job basis further down the road. And I also like the fact that we have a mapping from excluded files to arbitrary error messages so we can introduce new reasons for excluding files later.

@bloodearnest
Copy link
Member Author

This looks good to me. In particular, I like the fact that the max size is passed on the JobDefinition so we have the option to customise on a per-job basis further down the road.

Very intentional, I see that coming...

And I also like the fact that we have a mapping from excluded files to arbitrary error messages so we can introduce new reasons for excluding files later.

It's almost like I could see the future... 🤣

@bloodearnest bloodearnest merged commit fb86868 into main Oct 11, 2023
12 checks passed
@bloodearnest bloodearnest deleted the level-4-limits branch October 11, 2023 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants