Skip to content

Tech Meeting Notes 2020 09 24

Erik Moeller edited this page Sep 24, 2020 · 1 revision

UUIDs vs. IDs in SecureDrop

Background

We have both ID and UUIDs in most/all tables. UUID hides sequence & volume information. In the API we only expose UUIDs.

IDs are our primary keys used also for relationship between tables.

We've talked about splitting submissions (files vs. messages). Any table reorganization may cause ID conflicts.

Open questions

  • Do we want to get rid of IDs?
  • When should developers use UUIDs vs. IDs?
  • Do we want to ensure that we only log one kind of ID (e.g., UUID)?

Do we want to get rid of IDs?

(Allie) Pro: We could ensure that UUIDs are unique across message/file/reply tables so that we can have fewer association tables, e.g. one seen table that associates item_uuid with journalist_uuid (onece UUIDs are primary keys). It may be a bad idea to make UUIDs unique across these conversation item tables. Just an idea.

(Allie) Pro: UUID does not expose information about data (i.e. sequencing). ID seems redundant, cleanliness to having one unique primary key.

(John) Agreed. We have the primary key but treat the UUID as such. It is confusing. That said, changes to production data are risky -- not sure it is worth it.

(Kushal) UUIDs are supposed to be unique, but no way to verify at sqlite level. I agree that it's a big change as far as the DB is concerned.

(Erik) UUIDs have very low likelihood of being accidentally reassigned compared with IDs, e.g. during data imports. Have observed dev env only issues where data was attributed to wrong source.

(Mickael) UUIDs were added as we added the Journalist Interface API. That was done to prevent enumeration of sources when hitting API endpoint.

If we had better integrity with regard to foreign key referencing and the ability to enforce non-NULL across IDs would have similar properties to having IDs.

(Conor) We don't have a good handle on how large prod databases are. Ideally we could restructure in a way that's more intuitive to reason about.

(Erik) Database size may not be a major issue given that the bulk of the data is stored as files on disk (but still worth verifying).

(John) Could be part of a 2.0.0 release. Maybe before then have a check script that attempts to run migrations.

(Allie) Indeed. Foreign key support would also require a data migration which could be grouped with this change.

When should developers use UUIDs vs. IDs? Do we want to ensure that we only log one kind of ID (e.g., UUID)?

API should only ever see UUIDs, not IDs. Would be better to log UUIDs than IDs but security argument is weaker.

(Mickael) Let's update https://raw.githubusercontent.com/freedomofpress/securedrop/develop/docs/diagrams/securedrop-database.png

(Allie) And also make one for SecureDrop Client

Follow-up issues:

Clone this wiki locally