-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide support for specifying an IPFS hash for a CDXJ file for the replay system #80
Comments
What we are supporting is essentially |
@ibnesayeed This ticket is to fetch the cdxj index file via ipfs, not HTTP. Parsing and search efficiency may be a different issue. |
That's what I talked about in the previous comment and some related matters. |
@ibnesayeed We ought to have a larger sample data set to test this; for example, a 500(+?)-line CDXJ with associated ipwb hashes inline as appropriate. Once this CDXJ is in IPFS, we can use the sample data as benchmarks both for selective fetch and pywb's binary search once the selective data is fetched. There are a slew of GH tickets that could be spawned from this. ;) |
Nothing stops us from storing index in IPFS or elsewhere, but I am against the idea of storing an index in IPFS that will be changing frequently. |
@ibnesayeed If working on a static corpus for, say, research, the index may not be changing frequently. The crux of this ticket was really to not require the user to need to provide an (cdxj) index file to run the software but to be able to specify a hash, potentially shared by another user. |
Adding a special flag just to tell the server to treat the passed value as an IPFS hash and retrieve data from there would be too much embedded special cases in the application. Using the protocol prefix would be a more generalized approach and widely understandable. |
@ibnesayeed I would prefer smart defaults. If what "looks like" an IPFS hash is passed, treat it as so and process accordingly. I currently have a very fundamental case of this with reading in absolute/relative CDXJ files for the replay system. This also stinks of some potentially dangerous scenarios. That said, allowing special flags to force the type of interpretation would be good. Maybe that should be the initial approach but there is something elegant about specifying |
Even automatic detection of the hash signature is an embedded information.
is not any more complex or difficult than
The earlier is more expressive and uniformly suggests that if implemented, any type of URL could be supported. |
Do you think |
In case of |
Current support is for a CDXJ at an accessible path. Related to #61 .
The text was updated successfully, but these errors were encountered: