Preservation of not-yet-specified Lexicon fields #2126
Replies: 2 comments 7 replies
-
Stripping fields is incorrect. Lexicon violations are at the application level, not at the protocol level. The protocol is unopinionated about the contents of records and should carry the objects faithfully. The last thing we need is outdated middleware breaking applications. |
Beta Was this translation helpful? Give feedback.
-
Point 1
PDS is not a middleware, it's an endpoint that is directly responsible for signing records and act as authoritative source for them. Rest of the network doesn't know nor care what protocol the user uses to interact with their PDS. For interoperability with client apps it's important for a PDS to keep a somewhat standard API, but it's non-enforceable. Point 2
Yes, this is a very important use case. There's also another use case: third party adding on arbitrary fields for their own purposes (which lead to this conversation). At the protocol level there's no distinction between these cases. But they are very different semantically: in the second case the supposed authority over a lexicon definition has zero control over what gets added. Depending on the extent and use patterns, this may devolve into widespread non-standard fields becoming de-facto standard, or rampant use of custom fields effectively making lexicon definition irrelevant, or opening new abuse vectors (e.g., post length limit is 300 characters, but how about dumping 10 megs into a custom field? (sure, there's probably some limit on the total record size, but it's definitely not 300 characters)). I don't claim to have a solution for this, I haven't had to deal with a problem of extensible backward-compatible data format in adversarial environment before. But I think the problem of "how do I attach extra info to object defined with someone else's lexicon?" needs a better solution than "just put a new field there, it's grand". |
Beta Was this translation helpful? Give feedback.
-
On Discord, @seboslaw asked whether it's possible for an application to store additional data in a
app.bsky.feed.post
record somehow, and the answer was (paraphased) "yes, you can put arbitrary data under a custom field name, as long as it doesn't conflict with field names that might become specified in the future".It is my understanding that lexicon validators should ignore "new/unexpected" object fields that aren't yet specified in a schema, to allow for graceful evolution of lexicon schemas.
Following on from this, It's also my understanding that a service that processes lexicon records (e.g. a PDS or Relay) should pass through unknown fields as-is. This section of the docs implies that this is the desired behavior, but I don't think it explicitly states that it's correct (or that not doing so would be incorrect).
While storing data in new fields seems to work fine in practice, @imax9000 raised the point that another PDS implementation might decide to strip the unrecognized fields.
If a Relay stripped unknown fields it would break commit signatures, which would be a more obvious problem, but a PDS could plausibly do it.
I think such a PDS would be incorrect, but I can't point to docs that explicitly say so. Would it be incorrect?
Beta Was this translation helpful? Give feedback.
All reactions