-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
RFC 116: store attachment data in content items
- Loading branch information
Showing
1 changed file
with
147 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,147 @@ | ||
# Store attachment data in content items | ||
|
||
## Summary | ||
|
||
Add a new field to the details hash of content items, `attachments`, | ||
which has metadata about the page's attachments (if any). | ||
|
||
|
||
## Problem | ||
|
||
Pages on GOV.UK can have attachments, which come in a few different | ||
types. Whitehall, which probably has the richest model of | ||
attachments, has: | ||
|
||
- External attachments (links to other websites) | ||
- HTML attachments | ||
- File attachments (which can have previews) | ||
|
||
Attachments are referenced in the content of the page, and make up | ||
part of the govspeak (or HTML) the publishing app sends to the | ||
publishing-api. | ||
|
||
Attachments are *not* referenced anywhere else in the content item. | ||
To get the details of an attachment, you have to parse the body of the | ||
page. This restricts what we can do with attachments. For example: | ||
|
||
- We cannot generate comprehensive [schema.org][] metadata for attachments: [see this comment](https://github.com/alphagov/govuk_publishing_components/pull/1247#pullrequestreview-338008254). | ||
- Publishing API is unable to tell Asset Manager to take an asset out of draft state, which means publishing apps have to communicate with both. | ||
|
||
Additionally, users of the content API cannot make use of attachments | ||
without parsing the page body. This inhibits creative use of our | ||
content. | ||
|
||
|
||
## Proposal | ||
|
||
1. Add a new required field called `attachments` (of type [`asset_link_list`][]) to the `details` of formats which can have attachments. | ||
|
||
2. Add a new optional `preview_url` field to the `asset_link` type, making the schema: | ||
|
||
``` | ||
{ | ||
asset_link: { | ||
type: "object", | ||
additionalProperties: false, | ||
required: [ | ||
"url", | ||
"content_type", | ||
], | ||
properties: { | ||
content_id: { | ||
"$ref": "#/definitions/guid", | ||
}, | ||
url: { | ||
type: "string", | ||
format: "uri", | ||
}, | ||
preview_url: { | ||
type: "string", | ||
format: "uri", | ||
}, | ||
content_type: { | ||
type: "string", | ||
}, | ||
title: { | ||
type: "string", | ||
}, | ||
created_at: { | ||
format: "date-time", | ||
}, | ||
updated_at: { | ||
format: "date-time", | ||
}, | ||
}, | ||
}, | ||
asset_link_list: { | ||
description: "An ordered list of asset links", | ||
type: "array", | ||
items: { | ||
"$ref": "#/definitions/asset_link", | ||
}, | ||
}, | ||
} | ||
``` | ||
|
||
### Some design considerations | ||
|
||
**Why in the details hash?** | ||
|
||
[That's how specialist publisher does it][], and it seems sensible. | ||
|
||
**Why change `asset_link`? Why not a new schema?** | ||
|
||
I think the only thing missing from the `asset_link` schema is a | ||
preview URL, so any new `attachment` schema would be almost identical, | ||
which would be confusing. Plus, we already use `asset_link` for | ||
specialist documents. | ||
|
||
**Why make the `asset_link_list` mandatory?** | ||
|
||
Is there a difference between a missing list and a present, but empty, | ||
list? Maybe not, but I think it's better to be explicit that there | ||
are no attachments for a document. | ||
|
||
For implementing this RFC, the field should be optional, to avoid | ||
breaking publishing apps which haven't yet been updated. But once | ||
Publishing API has ben updated to accept the field, and publishing | ||
apps have been updated to set it, then it should be made mandatory. | ||
|
||
**Why `preview_url`? Why not other metadata?** | ||
|
||
Whitehall CSV attachments have automatically generated previews, which | ||
show you some initial portion of the file without needing to download | ||
it. That seems like a very useful feature to me, so it would be nice | ||
to expose it in the metadata. | ||
|
||
Other whitehall metadata, like ISBN, command/act paper number, and | ||
price (for ordering a physical copy) seem much more special-case and | ||
I'm not sure there would be much benefit to having them. | ||
|
||
Information about how to ask for an accessible format only makes sense | ||
as free text, so I see that as less useful to add to the content item, | ||
which I imagine as being consumed by machines. | ||
|
||
### Does this solve the problems? | ||
|
||
**Can we generate schema.org metadata for attachments?** | ||
|
||
Yes, the only thing we need for that is the attachment URL. | ||
|
||
**Can Publishing API communicate with Asset Manager?** | ||
|
||
With a little fiddling of URLs, yes. Asset URLs follow one of two | ||
formats: URLs for whitehall assets, which have a "legacy URL path", | ||
and URLs for all other assets, which have a UUID. Publishing API | ||
could use the `publishing_app` field to determine which URL format to | ||
expect, extract the path or UUID (if it's not an external URL), and | ||
send messages to Asset Manager about the asset. | ||
|
||
Publishing apps would still need to talk to both Publishing API and | ||
Asset Manager unless we implement uploading assets in Publishing API, | ||
however. But that is beyond the scope of this RFC. | ||
|
||
|
||
[schema.org]: http://schema.org/ | ||
[`asset_link_list`]: https://github.com/alphagov/govuk-content-schemas/blob/master/formats/shared/definitions/asset_links.jsonnet | ||
[That's how specialist publisher does it]: https://github.com/alphagov/govuk-content-schemas/blob/47a751e7eb193738c2ec43be03b149527a2b8e15/formats/specialist_document.jsonnet#L38 |