[py-tx][backwards incompatible] Split Update Format and Index Format #1014
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
The flow of information through the system is:
API.fetch() => Delta => Store => Index => Metadata
NCMEC's API is like the tag-based ThreatExchange API, and it's not stored as hash => record, it's actually stored record => [hash, hash, hash] which is incompatible with many assumptions made along the way.
After lots of mulling, I think the solution is to:
As a part of that, I need to mess with a lot of the interfaces above. There are more changes that I want to make, and I tried to checkpoint with this diff.
The final version I want to end up with is:
As a side effect, the storage format for the CLI changed from JSON to pickle. It's possible I can actually revert this in a few more PRs, but right now Delta is the only object that knows the correct format.
Test Plan
Added a test
Also went through: