-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: blob protocol draft #115
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating this ❤️ we are generally quite close on it
w3-blob.md
Outdated
|
||
###### Add Blob Committed | ||
|
||
Capability provider MUST issue receipt with `LocationClaim` capability delegated to the [space] when requested blob is added to the space, which can happen if provider already has a matching blob locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused here. This should only happen when bytes make its way to the bucket right? (based also on the diagram you previously shared)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is poorly worded, I'll fix it but what I meant to say that if provider already has such blob they can issue "committed"
case of the ok result as opposed to "allocated"
.
You are right that if provider does not already have a blob we are not going to issue LocationClaim
until we know we have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intent for blob/* to eventually replace store/*?
Do we have the work to implement this tracked anywhere? (seems like it's not trivial amount of work -- should probably ship with spec)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think we need agreement at minimum on prioritizing blob/*
work in client and server before we merge this spec. Otherwise it's a set of actions that aren't implemented
@hannahhoward yes! This is our way forward with write to anywhere to avoid the migration pains. But we need to include in the write to anywhere ticket |
I do think we want to some day, but there is no urgency to do so. We do want to introduce new blob capabilities here as they do not imply bucket events and we want to migrate clients to that.
Generally we have been aligning on spec first and only then going about implementing it, sometimes that imply changes to spec but alignment on how seemed to be a good way to avoid changing code back and forth. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment
So This needs clarification in the PR description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I think we should simply make the audience of the location claim public. The current flow implies a redelegation of authority every time the location changes.
I'm concerned we're thinking we can get private data for free here without thinking through the implications. Private retrieval is a huge topic -- I think we should design for it intentionally, rather than throw in features that may create lots of headaches on a theory it will be sufficient for private data. I think privacy should be set at the ingress point (i.e. by the space owner on the blob itself, absent the location, either during blob/add or through some seperate update apit), not on the location claim itself which is temporary.
Either way, I think we should move it out of our current design, because this requires more intention.
Realizing removing the redelegation requires invoking If a location is managed by the provider, it must be responsibile for keeping location claims, wherever they reside, updated. |
What we settled on in the storacha/RFC#13 that our issued location claims will have HTTP URLs with hints of where content is. Those URLs will simply route / redirect to the location content is in the system. This enables us to make long term location commitments while retaining ability to change actual site of the content over time. Does this address the concern you're raising ? |
I think this is inaccurate, as mentioned in prior comment per storacha/RFC#13 we intend to make issued location changes agnostic of content site changes.
I think there is a misunderstanding here. I also want to assure you that I have been thinking about private retrievals for more than a year now it's just there were always higher priorities at play. I'd like to step back from privacy here for moment. What is proposed here is not to assume that adding a blob implies you want it to be indexed, advertised and served across all possible channels. Instead we can issue a commitment to the space itself that we will serve the blob from the given URL through the commitment validity window. This commitment can be published by user if they want to make that data available publicly. This also creates an opportunity to create a system where our commitments can be verified and we can be held accountable if we fail to uphold them. In other words it gives user a choice to do what they want with our commitment to serve the blob. |
Co-authored-by: Vasco Santos <[email protected]>
…/store-api-revamp
Thinking back to storacha/RFC#13 I now see this is a bit of a misnomer as a solution. Yes, it's a public URL that will continue to work, if you go through w3s.link. It still has an expiration and the data still could move. Presumably if I have Also, unless I misunderstand, we are trying to get to a public mechanism whereby a client can find the location of the car and its index, and do range reads. How would it do so if the hints become out of date or the claim expires? Would it talk directly to the w3s content claims service? How would it know to do so (assuming it wasn't a web3.storage exclusive retrieval client)? Or would it be forced to fetch from w3s.link instead at that point? Ultimately, I think we're dealing with two different chains of authority we're trying to smash into one here:
I think we should probably take a step back and sync on all this. I feel that these are valid concerns, not just nitpicking, especially as we look forward to the architecture we're building. My proposal is to hold on this till we can sync during check-in on Thursday (in the interest of not adding meetings, use what's scheduled) Note: I'd prefer not to mess up @vasco-santos ability to implement, so I'm ok if he starts working in the meantime. But I think we're not in alignment completely yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing objections after synchronous discussion. Will document reasoning shortly.
Documentation about why I removed my objections:
Also: I can see a world where the right solution is to have the location claims be issued by provider, and then retrieved when you do |
type Blob = { | ||
content: Multihash | ||
size: int | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should allow user specify an expiry here so we can reflect it on our end with accept
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you may have misunderstood my intention behind exp
in accept
, there it meant to capture pre-signed url expiry. While here I would expect it to imply that blob should be GC-ed at given time.
Am I interpreting your intentions correctly here ? If so I'd say lets not add this until we actually are able to support GC-ing blobs, when we do we can introduce this field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I was considering this to be the expiration we talked about for expiration of claim #115 (comment)
but than can come as follow up
|
||
### Allocation Effects | ||
|
||
Allocation MUST have no effects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking this would actually point to Accept as well, given next step is that after allocation. Why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps 🤔, but here here is my sentiment
Can I allocate space without waiting for it to be received ? If so, this is a function to do and it should not have a continuation there. If no, I think I need bit more convincing because here are my counter arguments
-
Overall I think you could roughly translate blob/add as following JS code
const blobAdd = async (task) => { const { sub, args: { blob, exp } } = task // check that we have billing setup for the space const { ok: subscription, error: cantbill } = await service.getSubscription(sub) if (cantbill) { return { error: cantbill } } // allocate memory for the blob const { ok: memory, error: allocError } = await service.allocate(sub, { exp, blob, cause: task.link() }) if (allocError) { return { error: allocError } } // add usage to the billing if (memory.size > 0) { await subscription.bill.add({ storage: size, cause: task.link() }) } // if we have memory address we need to await for the content if (memory.address) { return { ok: await accept({ blob, exp }) } } else { return { ok: findReceiptFor({ accept, blob, exp }) } } } const accept = async ({ blob, exp }) => { const { ok: content, error: timeout } = await waitForPUT({ blob }) if (timeout) { return { error: timeout } } // validate content matches the blob const { error } = await vaildate(content, blob) if (error) { return { error } } return { ok: createLocationClaim(blob) }) }
And through above lens,
addBlob
is a task that coordinates subtasks, but subtasksallocate
andaccept
are independent of each other. -
If we made
accept
effect of theallocate
, then we probably should not make it effect of theblobAdd
also. However making it effect of theblobAdd
is better because we can then draw the whole execution graph from the receipt without having to follow each effect.
For what it's worth these observations and opinions are new to me also and I have call with Fission on Monday to try and get their input on this as they spend a lot more time doing and implementing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the details, I find this a good reasoning. My suggestion was more about having a receipt for a blob/allocate
and not really linking it to anything. But we have as part of the task arguments the blob/add
who caused this, so I think that also mitigates what I was looking for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few details in comment, as well as remove/list to add here. But I am happy to cover them in follow up :)
20911aa
to
bcccbc0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Based on #114. PR is written with UCAN 1.0 format and assuming storacha/RFC#12 however in terms of immediate implementation I suggest that we instead
blob/add
instead of/space/content/add/blob
web3.storage/blob/*
in place of/service/blob/
I suggest above because I think it would make more sense to transition to storacha/RFC#12 once we have [email protected] implemented, because I suspect links to tasks vs invocations are going to be a pain to deal with otherwise. This will give us cleaner break.
In terms of implementing
/service/blob/accept
and specifically how does client signal that they've completed upload I suggest that we do whichever is easiest option from following two:blob/add
invocation after they've done upload so we can perform a lookup.blob/add/poll
or very general likeinvocation/wake { task: CID }
.