Skip to content

Commit

Permalink
fixup! RFC 116: store attachment data in content items
Browse files Browse the repository at this point in the history
  • Loading branch information
barrucadu committed Jan 6, 2020
1 parent 0de2884 commit 0619502
Showing 1 changed file with 44 additions and 82 deletions.
126 changes: 44 additions & 82 deletions rfc-116-store-attachment-data-in-content-items.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,9 @@ publishing-api.

Attachments are *not* referenced anywhere else in the content item.
To get the details of an attachment, you have to parse the body of the
page. This restricts what we can do with attachments. For example:

- We cannot generate comprehensive [schema.org][] metadata for attachments: [see this comment](https://github.com/alphagov/govuk_publishing_components/pull/1247#pullrequestreview-338008254).
- Publishing API is unable to tell Asset Manager to take an asset out of draft state, which means publishing apps have to communicate with both.
page. This restricts what we can do with attachments. For example,
we cannot generate comprehensive [schema.org][] metadata for
attachments: [see this comment][].

Additionally, users of the content API cannot make use of attachments
without parsing the page body. This inhibits creative use of our
Expand All @@ -34,67 +33,45 @@ content.

## Proposal

1. Add a new required field called `attachments` (of type [`asset_link_list`][]) to the `details` of formats which can have attachments.

2. Add a new optional `preview_url` field to the `asset_link` type, making the schema:

```
{
asset_link: {
type: "object",
additionalProperties: false,
required: [
"url",
"content_type",
],
properties: {
content_id: {
"$ref": "#/definitions/guid",
},
url: {
type: "string",
format: "uri",
},
preview_url: {
type: "string",
format: "uri",
},
content_type: {
type: "string",
},
title: {
type: "string",
},
created_at: {
format: "date-time",
},
updated_at: {
format: "date-time",
},
},
},
asset_link_list: {
description: "An ordered list of asset links",
type: "array",
items: {
"$ref": "#/definitions/asset_link",
},
},
}
```
In our content schemas we already have an [`asset_link` type][], which
[Specialist Publisher uses for storing its attachment data][]. This
solves almost all of our problems: [Publishing API can render such
attachments with govspeak][], and content items can expose this data
to enable programmatic use.

However, the metadata Govspeak expects and the `asset_link` type are
*informally* the same. And furthermore, Whitehall has much richer
attachment metadata than either `asset_link` or Govspeak allow. So
this proposal has three parts:

1. Add a new required field called `attachments` (of type
`asset_link_list`) to the `details` of formats which can have
attachments.

2. Document the relationship between the `asset_link` type in our
content schemas and the data Govspeak attachments use. For
example, all fields required in one must be required in the other.
Any pairs of fields which do the same job in both should have the
same name. Implement any changes arising from this.

3. Add new optional fields to the `asset_link` type:

- `file_size` (used by govspeak)
- `preview_url` (used by whitehall)

### Some design considerations

**Why in the details hash?**

[That's how specialist publisher does it][], and it seems sensible.
That's how Specialist Publisher does it and how Publishing API expects
it if it's going to use Govspeak to render attachments.

**Why change `asset_link`? Why not a new schema?**

I think the only thing missing from the `asset_link` schema is a
preview URL, so any new `attachment` schema would be almost identical,
which would be confusing. Plus, we already use `asset_link` for
specialist documents.
I think there are only a couple of fields missing from the
`asset_link` schema, so any new `attachment` schema would be almost
identical, which would be confusing. Plus, we already use
`asset_link` for specialist documents.

**Why make the `asset_link_list` mandatory?**

Expand All @@ -107,42 +84,27 @@ breaking publishing apps which haven't yet been updated. But once
Publishing API has ben updated to accept the field, and publishing
apps have been updated to set it, then it should be made mandatory.

**Why `preview_url`? Why not other metadata?**
**Why `preview_url`? Why not more whitehall metadata?**

Whitehall CSV attachments have automatically generated previews, which
show you some initial portion of the file without needing to download
it. That seems like a very useful feature to me, so it would be nice
to expose it in the metadata.

Other whitehall metadata, like ISBN, command/act paper number, and
price (for ordering a physical copy) seem much more special-case and
I'm not sure there would be much benefit to having them.
price (for ordering a physical copy) seem much more special-case. We
don't want to overcomplicate Govspeak by adding many new attributes to
its attachment model. We can instead add new fields as we need to
and, in doing so, might discover that Whitehall has some features
which aren't needed.

Information about how to ask for an accessible format only makes sense
as free text, so I see that as less useful to add to the content item,
which I imagine as being consumed by machines.

### Does this solve the problems?

**Can we generate schema.org metadata for attachments?**

Yes, the only thing we need for that is the attachment URL and content
type.

**Can Publishing API communicate with Asset Manager?**

With a little fiddling of URLs, yes. Asset URLs follow one of two
formats: URLs for whitehall assets, which have a "legacy URL path",
and URLs for all other assets, which have a UUID. Publishing API
could use the `publishing_app` field to determine which URL format to
expect, extract the path or UUID (if it's not an external URL), and
send messages to Asset Manager about the asset.

Publishing apps would still need to talk to both Publishing API and
Asset Manager unless we implement uploading assets in Publishing API,
however. But that is beyond the scope of this RFC.


[schema.org]: http://schema.org/
[`asset_link_list`]: https://github.com/alphagov/govuk-content-schemas/blob/master/formats/shared/definitions/asset_links.jsonnet
[That's how specialist publisher does it]: https://github.com/alphagov/govuk-content-schemas/blob/47a751e7eb193738c2ec43be03b149527a2b8e15/formats/specialist_document.jsonnet#L38
[see this comment]: https://github.com/alphagov/govuk_publishing_components/pull/1247#pullrequestreview-338008254
[`asset_link` type]: https://github.com/alphagov/govuk-content-schemas/blob/47a751e7eb193738c2ec43be03b149527a2b8e15/formats/shared/definitions/asset_links.jsonnet
[Specialist Publisher uses for storing its attachment data]: https://github.com/alphagov/govuk-content-schemas/blob/47a751e7eb193738c2ec43be03b149527a2b8e15/formats/specialist_document.jsonnet#L38
[Publishing API can render such attachments with govspeak]: https://github.com/alphagov/publishing-api/blob/1b8e328540ab96759807787ab87dcb33bbcc72e3/app/presenters/details_presenter.rb#L71-L73

0 comments on commit 0619502

Please sign in to comment.