Preservation of not-yet-specified Lexicon fields #2126

DavidBuchanan314 · 2024-02-02T17:02:46Z

DavidBuchanan314
Feb 2, 2024

On Discord, @seboslaw asked whether it's possible for an application to store additional data in a app.bsky.feed.post record somehow, and the answer was (paraphased) "yes, you can put arbitrary data under a custom field name, as long as it doesn't conflict with field names that might become specified in the future".

It is my understanding that lexicon validators should ignore "new/unexpected" object fields that aren't yet specified in a schema, to allow for graceful evolution of lexicon schemas.

Following on from this, It's also my understanding that a service that processes lexicon records (e.g. a PDS or Relay) should pass through unknown fields as-is. This section of the docs implies that this is the desired behavior, but I don't think it explicitly states that it's correct (or that not doing so would be incorrect).

While storing data in new fields seems to work fine in practice, @imax9000 raised the point that another PDS implementation might decide to strip the unrecognized fields.

If a Relay stripped unknown fields it would break commit signatures, which would be a more obvious problem, but a PDS could plausibly do it.

I think such a PDS would be incorrect, but I can't point to docs that explicitly say so. Would it be incorrect?

pfrazee · 2024-02-02T17:41:26Z

pfrazee
Feb 2, 2024
Maintainer

Stripping fields is incorrect. Lexicon violations are at the application level, not at the protocol level. The protocol is unopinionated about the contents of records and should carry the objects faithfully. The last thing we need is outdated middleware breaking applications.

0 replies

imax9000 · 2024-02-02T20:24:22Z

imax9000
Feb 2, 2024

Point 1

Stripping fields is incorrect. Lexicon violations are at the application level, not at the protocol level. The protocol is unopinionated about the contents of records and should carry the objects faithfully. The last thing we need is outdated middleware breaking applications.

PDS is not a middleware, it's an endpoint that is directly responsible for signing records and act as authoritative source for them. Rest of the network doesn't know nor care what protocol the user uses to interact with their PDS.

For interoperability with client apps it's important for a PDS to keep a somewhat standard API, but it's non-enforceable.

Point 2

It is my understanding that lexicon validators should ignore "new/unexpected" object fields that aren't yet specified in a schema, to allow for graceful evolution of lexicon schemas.

Yes, this is a very important use case. There's also another use case: third party adding on arbitrary fields for their own purposes (which lead to this conversation). At the protocol level there's no distinction between these cases.

But they are very different semantically: in the second case the supposed authority over a lexicon definition has zero control over what gets added. Depending on the extent and use patterns, this may devolve into widespread non-standard fields becoming de-facto standard, or rampant use of custom fields effectively making lexicon definition irrelevant, or opening new abuse vectors (e.g., post length limit is 300 characters, but how about dumping 10 megs into a custom field? (sure, there's probably some limit on the total record size, but it's definitely not 300 characters)).

I don't claim to have a solution for this, I haven't had to deal with a problem of extensible backward-compatible data format in adversarial environment before. But I think the problem of "how do I attach extra info to object defined with someone else's lexicon?" needs a better solution than "just put a new field there, it's grand".

7 replies

imax9000 Feb 2, 2024

Re point 2, yeah it's not perfect, but what's the alternative? Maybe there could be some kind of reserved ext field, acting as a container object for fields added by 3rd parties (anyone who isn't the "owner" of the original lexicon)

There are probably better options, but here's the result of one minute of thinking about it:

Add a sequential revision number to lexicon definition
Have all records refer to a specific revision

Then if PDS knows about that particular revision - it can properly filter out bogus fields or just throw an error. If it doesn't - it can consult with authoritative source if such revision even exists and update if it does.

For adding extra stuff - as I suggested initially, put it into a separate collection with the same rkey. Then there's clear separation of authority over data format.

pfrazee Feb 2, 2024
Maintainer

It's best not to think of schemas in terms of authority and instead view them as social constructs in the communal effort to maintain intercompatibility.

Yes, the lexicon doc-authors have no power to control what people do. This is also true of any spec author. We can't force people to follow protocol with their machines.

The lexicon authors do, however, have the power to establish what's expected by software, and if you run counter to a lexicon (or any accepted spec) then you are likely to experience bugged interactions. The lexicons are a guide to help people come to consensus. Their authority comes from their collective usage.

Regarding unspecced extensions, there definitely could be some wisdom to establishing norms about where to put them to avoid collisions. It'd be even better if those extensions could define fallback behaviors so that unsupporting software could detect them and trigger some intelligent behaviors when support isn't available. We've kicked around a couple ideas, but haven't resolved on anything yet.

imax9000 Feb 2, 2024

It doesn't really matter what you believe to be the best way to think about this. Other people doing things with atproto only care about what they can do with the current state of things.

And it's silly to assume that they all share common values and goals.

pfrazee Feb 2, 2024
Maintainer

okay well they can go off spec and their content may not work correctly with other apps

imax9000 Feb 3, 2024

It's still early enough in the product lifecycle for you to be able to nudge this in a direction that doesn't end up in unmaintainable mess of of compatibility matrices

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preservation of not-yet-specified Lexicon fields #2126

{{title}}

Replies: 2 comments 7 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Preservation of not-yet-specified Lexicon fields #2126

DavidBuchanan314 Feb 2, 2024

Replies: 2 comments · 7 replies

pfrazee Feb 2, 2024 Maintainer

imax9000 Feb 2, 2024

Point 1

Point 2

imax9000 Feb 2, 2024

pfrazee Feb 2, 2024 Maintainer

imax9000 Feb 2, 2024

pfrazee Feb 2, 2024 Maintainer

imax9000 Feb 3, 2024

DavidBuchanan314
Feb 2, 2024

Replies: 2 comments 7 replies

pfrazee
Feb 2, 2024
Maintainer

imax9000
Feb 2, 2024

pfrazee Feb 2, 2024
Maintainer

pfrazee Feb 2, 2024
Maintainer