Skip to content

Commit

Permalink
fixup! RFC 116: store attachment data in content items
Browse files Browse the repository at this point in the history
  • Loading branch information
barrucadu committed Jan 23, 2020
1 parent 3511597 commit 67d9248
Showing 1 changed file with 41 additions and 23 deletions.
64 changes: 41 additions & 23 deletions rfc-116-store-attachment-data-in-content-items.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,42 +3,57 @@
## Summary

Add a new field to the details hash of content items, `attachments`,
which has metadata about the page's attachments (if any).
which has metadata about the document's attachments (if any).


## Problem

Pages on GOV.UK can have attachments, which come in a few different
types. Whitehall, which has the richest model of
Documents on GOV.UK can have attachments, which come in a few
different types. Whitehall, which has the richest model of
attachments, has:

- External attachments (links to other websites)
- HTML attachments
- File attachments (which can have previews)

Attachments are referenced in the content of the page, and make up
part of the govspeak (or HTML) the publishing app sends to the
publishing-api.
When thinking about how attachments are displayed to the user, there
are two sorts:

Attachments are *not* referenced anywhere else in the content item.
To get the details of an attachment, you have to parse the body of the
page. This restricts what we can do with attachments. For example,
we cannot generate comprehensive [schema.org][] metadata for
attachments: [see this comment][].
- *Document-level attachments*, such on Whitehall publication and
consultation documents, which are displayed separately to the main
content.

- *Inline attachments*, such as in other Whitehall documents, manual
sections, specialist documents, and travel advice documents, which
are referenced in the page content and rendered as links.

Publishing applications vary in how easily accessible they make
attachment metadata in their content items. Whitehall only sends
rendered HTML to the Publishing API. Other publishing apps send a
combination of rendered HTML and metadata.

This inconistency restricts what we can do with attachments,
particularly for Whitehall content. For example, we cannot generate
comprehensive [schema.org][] metadata for attachments: [see this
comment][].

Additionally, users of the content API cannot make use of attachments
without parsing the page body. This inhibits creative use of our
without parsing the document body. This inhibits creative use of our
content.

We should make metadata for Whitehall attachments available in the
same way as other publishing applications, but that requires some
changes into how we represent attachments across the stack.


## Proposal

In our content schemas we already have an [`asset_link` type][], which
is used for attachments in specialist documents and manual sections.
It's also used in travel advice pages for the page image and a single
attachment. This type appears to solve all of our problems: [Publishing API can
render such attachments with govspeak][], and content items can expose
this data to enable programmatic use.
attachment. This type appears to solve all of our problems:
[Publishing API can render such attachments with govspeak][], and
content items can expose this data to enable programmatic use.

However, the metadata Govspeak expects and the `asset_link` type are
only *informally* the same. And furthermore, Whitehall has much richer
Expand All @@ -63,9 +78,12 @@ I propose we do this:
- Attachments:
- `accessible`
- `alternative_format_contact_email`
- `attachment_type`
- `command_paper_number`
- `file_size`
- `filename`
- `hoc_paper_number`
- `id`
- `isbn`
- `locale`
- `number_of_pages`
Expand All @@ -91,11 +109,11 @@ I propose we do this:
to remove them from Whitehall: `order_url`, `price_in_pence`.

And these fields because they don't need to be exposed outside the
publishing app: `attachable_id`, `attachable_type`,
`attachment_data_id`, `deleted`, `ordering`, `slug`.
publishing app: `attachable_id`, `attachment_data_id`, `deleted`,
`ordering`, `slug`.

**Migration concerns:** This can be done without breaking
compatibility.
compatibility, as it doesn't add any new mandatory fields.

1. Replace `content_id` in attachments with `id`.

Expand All @@ -107,7 +125,6 @@ I propose we do this:
1. Remove non-useful metadata fields:

- Attachments: `created_at` and `updated_at`
- Images: `content_type`

**Migration concerns:** this would need to be done in two steps:
update apps; remove fields.
Expand Down Expand Up @@ -162,6 +179,7 @@ schema will be:
properties: {
accessible: { type: "boolean", },
alternative_format_contact_email: { type: "string", },
attachment_type: { type: "string", enum: ["external", "file", "html"], },
command_paper_number: { type: "string", },
content_type: { type: "string", },
file_size: { type: "integer", },
Expand Down Expand Up @@ -196,10 +214,10 @@ schema will be:

There are really two different types of attachments in documents.
There are attachments which may be referenced in Govspeak but don't
otherwise appear on the page; in this case the attachment hashes are
more like metadata than actual page content. There are also
attachments which appear differently to the normal page text, like in
publications.
otherwise appear on the document; in this case the attachment hashes
are more like metadata than actual document content. There are also
attachments which appear differently to the normal document text, like
in publications.

The `details.attachments` list would contain both types of
attachments, so there is one list with everything in. The
Expand Down

0 comments on commit 67d9248

Please sign in to comment.