Protocol Tech Debt (2024) #2128
Replies: 4 comments 3 replies
-
Regarding "recursive migration" of records, I believe it's relatively feasible using the following high-level approach:
I believe this algorithm is overall O(n), and it guarantees that each record is processed at most once. Step 5 is most simply done serially, but it can be done arbitrarily slowly. Assuming you have the disk space for it, you can fully complete all these steps and double-check your work before embarking on step 6, which actually commits the changes. While step 6 is in progress, people might notice subtle breakage when browsing old threads, but other than that it should be fairly unobtrusive. Theoretically you could figure out how to make all the changes atomically but that's probably more trouble than it's worth. There's also plenty of scope for concurrency/parallelism. Edit: I just thought of one issue with this approach. If any new replies (or other references) are created to posts that need edits, during the execution of the algorithm, then those references would become stale. You could minimise this by waiting a while after step 1. |
Beta Was this translation helpful? Give feedback.
-
Something else you might want to add: [non-]support for floating point values in records. The spec says they're not allowed (last time I checked) but the current implementation allows them (also last time I checked), and there are floats in records in-the-wild (at least, there are in my repo) |
Beta Was this translation helpful? Give feedback.
-
In order to create trusted timestamps I recommend requiring all timestamps to be signed by some sort of agreed upon time server / block chain similar to how https://opentimestamps.org/ works. The only key difference is that open timestamps uses Bitcoin to sign timestamps but I reccommend XRP because it has fast blocks (~4 seconds) and is really cheap per transaction. The signature would just include the merkle tree path between the object's hash and the XRP record hash, as well as the ID of the XRP record and ledger id. I guesstimate that this would only require ~300 bytes of extra data which is more than affordable for signed timestamps. |
Beta Was this translation helpful? Give feedback.
-
Also why are merkle search trees used instead of B+ trees, they are less efficent in speed, and space, and their only advantage is their reproduciblity but as far as I can tell, a reproducible repo is not needed. |
Beta Was this translation helpful? Give feedback.
-
This is a living list of known issues with atproto, as it is running today in the live network, which we would like to change or fix.
It isn’t a list of missing protocol features or product features; there are a bunch of other things we plan to finish or add to atproto. These are only lower-level technical issues with stuff which has already been specified and implemented, and which could impact future interoperability.
Note that almost none of these impact most apps, bots, or other integrations from being built today. Most of these have to do with low-level infrastructure (like Relays), or dealing with old content from the early days of the network.
See also: #1711
Urgent
Stuff that we want to fix in time for federation, but might slip.
prev
is Nullable and Optional: see discussion below about nullable+optional in general. In particular we should make theprev
field on commit objects one or the other; probablyoptional
. In theory this would require a change of the repo version number, but that is disruptive; maybe we can get away with just saying it is optional in version3
? UPDATE: we will say it is nullable-only in version3
, see Resolution of repo v3 commit `prev` being both nullable and optional #2181:
), but this has not been enforced by the PDS, and there are some popular feed generator records in the wild with this character. There is some debate over whether we should change the spec or migrate the records; the important thing is to get implementation and specification aligned. UPDATE: we decided we'll allow colons in record keys, see Decision to allow colon character (`:`) in Record Keys #2224Strictness and Completeness
These are changes which should not impact “well behaved” clients, but could invalidate some old records.
app.bsky.actor.profile
with record key as a TID should be a validation error (can override by skipping validation at record creation time).tid
andrecord-key
as String Formats to Lexicon Language: many other identifiers in atproto have their own Lexicon string formats, which help with automatic validation. TID and Record Key are both specified identifiers and appear in a lot of APIs. They should get formats, and exiting Lexicons should be updated to require them (this is a “bending” schema change). UPDATE: these formats have been added to the Lexicon specs, though are not yet enforced in the Lexicons themselves.null
vs entirely omitting the field). Some programming language serialization libraries can support this as well, but others struggle. In particular, there isn’t an obvious/idiomatic way to distinguish in golang using the standard Marshal/Unmarshal system. Proposal is to update Lexicon spec to say this combination is disallowed; and update any instances in current Lexicons (might be breaking changes).$type
) or token strings (which are are a reference) can be used in unions. Eg, it probably isn't possible or allowed to put an integer in a union. This should be clarified.unknown
: it is probably the case that only objects can be used forunknown
format data, but this isn't specified and should be cleared up. Furthermore, any recursive restrictions on the object data should be clarified. For example, floats are mentioned as not being allowed, but further invalidblob
-like objects are probably not allowed either.Lexicon Cleanups
There are some cleanups we’d like to do which will bend or breaking Lexicon stability rules. We don’t think these will actually be very disruptive, and think it is reasonable to declare “one last” round of breaking changes before we commit to stability.
com.atproto.admin.*
which are pretty specific to the Ozone backend. We’d like to move these to an Ozone-specific NSID namespace (eg,tools.ozone.*
), which would be independent of bothcom.atproto
andapp.bsky
.subscribeRepos
usesdid
to indicate account DID for almost all event types, except for#commit
, which usesrepo
for that field. Would be good to update for consistency.“Someday”
Things that might change in the future, but aren’t planned work. Might get to these in the coming year or so, or going through a formal standardization process would be a good time to revisit these.
/xrpc/
prefix: requiring HTTP URL path prefixes is generally frowned on by standards folks. We also try not to use the “XRPC” terminology as much these days. We can probably find a way to make the prefix flexible, allowing folks to use an alternative if they want.unknown
nested field/object?"Unfortunate"
These are things we know are controversial or have poor developer experience, but are not likely to change at this point. But maybe they will, if developer friction continues to be bad despite iterations on docs and SDK design.
createdAt
andindexedAt
can be combined in to asortAt
for use in display and ordering posts in feeds. This continues to cause confusion and consternation, and some folks would think that global reliable timestamps are important enough to warrant an extension to the protocol.Beta Was this translation helpful? Give feedback.
All reactions