From 67d9248aee1dd102e6036187f7603b4b5ba92a83 Mon Sep 17 00:00:00 2001 From: Michael Walker Date: Thu, 23 Jan 2020 11:21:33 +0000 Subject: [PATCH] fixup! RFC 116: store attachment data in content items --- ...-store-attachment-data-in-content-items.md | 64 ++++++++++++------- 1 file changed, 41 insertions(+), 23 deletions(-) diff --git a/rfc-116-store-attachment-data-in-content-items.md b/rfc-116-store-attachment-data-in-content-items.md index 9dd199dd..5ec78bbb 100644 --- a/rfc-116-store-attachment-data-in-content-items.md +++ b/rfc-116-store-attachment-data-in-content-items.md @@ -3,42 +3,57 @@ ## Summary Add a new field to the details hash of content items, `attachments`, -which has metadata about the page's attachments (if any). +which has metadata about the document's attachments (if any). ## Problem -Pages on GOV.UK can have attachments, which come in a few different -types. Whitehall, which has the richest model of +Documents on GOV.UK can have attachments, which come in a few +different types. Whitehall, which has the richest model of attachments, has: - External attachments (links to other websites) - HTML attachments - File attachments (which can have previews) -Attachments are referenced in the content of the page, and make up -part of the govspeak (or HTML) the publishing app sends to the -publishing-api. +When thinking about how attachments are displayed to the user, there +are two sorts: -Attachments are *not* referenced anywhere else in the content item. -To get the details of an attachment, you have to parse the body of the -page. This restricts what we can do with attachments. For example, -we cannot generate comprehensive [schema.org][] metadata for -attachments: [see this comment][]. +- *Document-level attachments*, such on Whitehall publication and + consultation documents, which are displayed separately to the main + content. + +- *Inline attachments*, such as in other Whitehall documents, manual + sections, specialist documents, and travel advice documents, which + are referenced in the page content and rendered as links. + +Publishing applications vary in how easily accessible they make +attachment metadata in their content items. Whitehall only sends +rendered HTML to the Publishing API. Other publishing apps send a +combination of rendered HTML and metadata. + +This inconistency restricts what we can do with attachments, +particularly for Whitehall content. For example, we cannot generate +comprehensive [schema.org][] metadata for attachments: [see this +comment][]. Additionally, users of the content API cannot make use of attachments -without parsing the page body. This inhibits creative use of our +without parsing the document body. This inhibits creative use of our content. +We should make metadata for Whitehall attachments available in the +same way as other publishing applications, but that requires some +changes into how we represent attachments across the stack. + ## Proposal In our content schemas we already have an [`asset_link` type][], which is used for attachments in specialist documents and manual sections. It's also used in travel advice pages for the page image and a single -attachment. This type appears to solve all of our problems: [Publishing API can -render such attachments with govspeak][], and content items can expose -this data to enable programmatic use. +attachment. This type appears to solve all of our problems: +[Publishing API can render such attachments with govspeak][], and +content items can expose this data to enable programmatic use. However, the metadata Govspeak expects and the `asset_link` type are only *informally* the same. And furthermore, Whitehall has much richer @@ -63,9 +78,12 @@ I propose we do this: - Attachments: - `accessible` - `alternative_format_contact_email` + - `attachment_type` - `command_paper_number` - `file_size` + - `filename` - `hoc_paper_number` + - `id` - `isbn` - `locale` - `number_of_pages` @@ -91,11 +109,11 @@ I propose we do this: to remove them from Whitehall: `order_url`, `price_in_pence`. And these fields because they don't need to be exposed outside the - publishing app: `attachable_id`, `attachable_type`, - `attachment_data_id`, `deleted`, `ordering`, `slug`. + publishing app: `attachable_id`, `attachment_data_id`, `deleted`, + `ordering`, `slug`. **Migration concerns:** This can be done without breaking - compatibility. + compatibility, as it doesn't add any new mandatory fields. 1. Replace `content_id` in attachments with `id`. @@ -107,7 +125,6 @@ I propose we do this: 1. Remove non-useful metadata fields: - Attachments: `created_at` and `updated_at` - - Images: `content_type` **Migration concerns:** this would need to be done in two steps: update apps; remove fields. @@ -162,6 +179,7 @@ schema will be: properties: { accessible: { type: "boolean", }, alternative_format_contact_email: { type: "string", }, + attachment_type: { type: "string", enum: ["external", "file", "html"], }, command_paper_number: { type: "string", }, content_type: { type: "string", }, file_size: { type: "integer", }, @@ -196,10 +214,10 @@ schema will be: There are really two different types of attachments in documents. There are attachments which may be referenced in Govspeak but don't -otherwise appear on the page; in this case the attachment hashes are -more like metadata than actual page content. There are also -attachments which appear differently to the normal page text, like in -publications. +otherwise appear on the document; in this case the attachment hashes +are more like metadata than actual document content. There are also +attachments which appear differently to the normal document text, like +in publications. The `details.attachments` list would contain both types of attachments, so there is one list with everything in. The