-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP server fixes #15
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit work @FotiosBistas ! The PR looks nice overall, just left some comments/questions where I would love to know your insights.
I'm also wondering if the Readme file is updated to the latest changes in the addition of the API to the hoarder.
pkg/cid-source/http_server.go
Outdated
|
||
} else if r.Method == http.MethodGet { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reasoning behind having a Get method in the API? I thought we just needed to notify with a POST of the CID that we want to track (and the related info)
Am I missing anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @cortze, the functionality of the HTTP server is that a post requests are received by the peer that is publishing, and encapsulated json
arrays are then added to the queue.
type HttpCidSource struct {
port int
hostname string
lock sync.Mutex
server *http.Server
providerRecords []ProviderRecords
isStarted bool
}
func (httpCidSource *HttpCidSource) Dequeue() ProviderRecords {
httpCidSource.lock.Lock()
defer httpCidSource.lock.Unlock()
if len(httpCidSource.providerRecords) == 0 {
log.Debug("Queue is empty")
return ProviderRecords{}
}
elem := httpCidSource.providerRecords[0]
httpCidSource.providerRecords = httpCidSource.providerRecords[1:]
log.Debugf("Removed element from queue, length is now: %d", len(httpCidSource.providerRecords))
return elem
}
func (httpCidSource *HttpCidSource) Enqueue(providerRecords ProviderRecords) {
httpCidSource.lock.Lock()
defer httpCidSource.lock.Unlock()
httpCidSource.providerRecords = append(httpCidSource.providerRecords, providerRecords)
log.Debugf("Added new element to queue, length is now: %d", len(httpCidSource.providerRecords))
}
//A container for the encapsulated struct.
//
//File containts a json array of provider records.
//[{ProviderRecord1},{ProviderRecord2},{ProviderRecord3}]
type ProviderRecords struct {
EncapsulatedJSONProviderRecords []EncapsulatedJSONProviderRecord `json:"ProviderRecords"`
}
//This struct will be used to create,read and store the encapsulated data necessary for reading the
//provider records.
type EncapsulatedJSONProviderRecord struct {
ID string `json:"PeerID"`
CID string `json:"ContentID"`
Creator string `json:"Creator"`
PublicationTime string `json:"PublicationTime"`
ProvideTime string `json:"ProvideTime"`
UserAgent string `json:"UserAgent"`
Addresses []string `json:"PeerMultiaddresses"`
}
This means that we wait for all the providers to be added into the array.
The get method get's called right now of a time interval of 5 seconds and we receive the first element of the queue using a dequeue
method. I found it easier to be able to determine whether the queue is just empty, inferring that the publisher is just publishing slower than the queue empties itself, or the publisher will not publish any more elements.
If the queue is just empty the server responds back with NoContent
, else if the publisher finished publishing it must send a nil
encapsulated json
:
http.Error(w, "No record available currently", http.StatusNoContent)
inside the interval function
trackableCids, err := src.GetNewHttpCid(tracker.CidSource)
if trackableCids == nil {
log.Debug("Received empty provider records")
trackableCidArrayC <- nil
close(trackableCidArrayC)
//gracefully shutdown server
go tracker.httpSource.Shutdown(tracker.ctx)
break
}
if err != nil {
log.Errorf("error while getting new cid: %s", err)
// check if ticker for next iteration was raised
<-minTimeT.C
continue
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohhhhh I see, I thought that the API was organized in a different way.
My understanding was that we would initialize the API server in the Cid-Hoarder, and that the publisher would be able to POST the CidInfo and the related one to the PRHolders in the same POST query.
But being this POST query after the publication has been accomplished
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried doing it like this but I was impossible to tell whether the publishing process was finished. It also would add a lot of extra logic by checking for each PR which CID it corresponds to and some type of waiting mechanism to add all the PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but are you taking into account having a single json sent over the POST method including the CidInfo and PRHolders?
(something like this)
{
"CidInfo":{
"cid":
....
"pr_holders":{
"peer_id":
....
}
}
}
this would simplify having to match PRHolders with CIDs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is exactly how I'm sending it:
"ProviderRecords":[
{
"PeerID":"12D3KooWG1MhbRPTGZiiLf1iECH19YMFjurkGjGsmKstfEMvNF42",
"ContentID":"QmNyxnsJ15nWD85R3xjTKCm37BSjPqyzMHcwBrGqhaWhy8",
"Creator":"QmPod7yQRSQX8jDtQCBi65ZSGfbkmoAX5FH4g8wzzJmtnY",
"PublicationTime":"2023-01-16T14:04:42+02:00",
"ProvideTime":"5.4405671s",
"UserAgent":"hydra-booster/0.7.4",
"PeerMultiaddresses": ["/ip4/18.188.54.97/udp/30014/quic","/ip4/18.188.54.97/tcp/30014","/ip4/172.31.10.39/udp/30014/quic","/ip4/127.0.0.1/udp/30014/quic","/ip4/172.31.10.39/tcp/30014","/ip4/127.0.0.1/tcp/30014"]
},
{
"PeerID":"12D3KooWA4m41sRq68mdhk5cSTBptwxSXSsrjsyRKf2WJbNE7h9x",
"ContentID":"QmNyxnsJ15nWD85R3xjTKCm37BSjPqyzMHcwBrGqhaWhy8",
"Creator":"QmPod7yQRSQX8jDtQCBi65ZSGfbkmoAX5FH4g8wzzJmtnY",
"PublicationTime":"2023-01-16T14:04:42+02:00",
"ProvideTime":"5.4405671s",
"UserAgent":"go-ipfs/0.7.0/",
"PeerMultiaddresses":["/ip4/127.0.0.1/tcp/4001","/ip6/2605:7380:1000:1310:c08b:37ff:fe37:5ec3/tcp/4001","/ip4/127.0.0.1/udp/4001/quic","/ip6/::1/tcp/4001","/ip4/209.50.56.24/udp/4001/quic","/ip6/2605:7380:1000:1310:c08b:37ff:fe37:5ec3/udp/4001/quic","/ip6/::1/udp/4001/quic","/ip4/209.50.56.24/tcp/4001"]
},
{
"PeerID":"12D3KooWSFppQG3QwTXCVKxtfQEjwoY6sLVgbLCegntd87Dc8L8Q",
"ContentID":"QmNyxnsJ15nWD85R3xjTKCm37BSjPqyzMHcwBrGqhaWhy8",
"Creator":"QmPod7yQRSQX8jDtQCBi65ZSGfbkmoAX5FH4g8wzzJmtnY",
"PublicationTime":"2023-01-16T14:04:42+02:00",
"ProvideTime":"5.4405671s",
"UserAgent":"go-ipfs/0.7.0/",
"PeerMultiaddresses":["/ip4/127.0.0.1/udp/4001/quic","/ip4/119.94.38.201/tcp/4001","/ip6/2001:4451:1114:8600:10f7:c70b:bec3:b939/udp/4001/quic","/ip6/::1/tcp/4001","/ip6/2001:4451:1114:8600:10f7:c70b:bec3:b939/tcp/4001","/ip4/127.0.0.1/tcp/4001","/ip6/2001:4451:1114:8600:a4a2:76bb:267:6cee/tcp/4001","/ip6/2001:4451:1114:8600:a4a2:76bb:267:6cee/udp/4001/quic","/ip6/::1/udp/4001/quic","/ip4/119.94.38.201/udp/4001/quic"]
},
]
Notice that the ContentID
,Creator
,ProvideTime
,PublicationTime
stay the same. After that is sent a new batch of provider records for a specific CID
will be sent over the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reasoning behind having a Get method in the API? I thought we just needed to notify with a POST of the CID that we want to track (and the related info)
Am I missing anything?
@cortze I removed the get method. Essentially the internal queue of the server would fill up and ,at some point later after a lot of CIDs have been inserted, would never empty itself. I replaced it, just like you mentioned at the above comment, with just a post method that sends the received CIDs through a channel and then inserts them into the database. It does manage to insert them all, unlike the previous method which would get stuck, but it still is slow. E.G. the database has inserted 1500/1900 CIDs (numbers are random here) immediately and then the rest 400 CIDs all will get inserted after a couple of hours. So I think some approaches would be:
- Make the process synchronous, wait for all of the CIDs to get published and then insert them to the hoarder (can be done very easily, but is essentially the same as the json file approach)
- Implement some type of batching system, like inserting 1000 cids at a time into the database (I think it would be better to be done from the database side)
- Leave it as it is and we get some of the benefits of the asynchronous pings
Would love to hear your thoughts!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @FotiosBistas , thanks for taking into consideration the upper suggestion. Tbh it makes everything cleaner and easier to follow.
About your different approaches:
- If you wait for all the CIDs to be published, you cannot follow the timings in each ping accurately. We will synchronize all the pings, having spikes of heavy ping sessions. Having the publication of it spaced over time serve the purpose as load balancing, so I'll say that it is fine to keep it this way
- Adding batching over queries makes complete sense. I'll work on it in the following month.
- It must stay like this until the batching is implemented. I need to sneak some time to take action on this.
No I thought it would have been best to not add here. Shall I add it here? |
Break was not working inside the select statement. Added returned instead. This exposed addProviderWg.add(1) was not being called. Http server when content is finished providing now returns nil,nil
Queue would fill up sometimes up to 1500 cids, due to the mintimer iteration scheduling. Implemented emptying the whole queue.
Added channel which listens to post requests and then sends them over to the discoverer, avoiding the HTTP server queue getting stuck.
Motivation
An automated way to insert already published cids to the hoarder. The only way currently is to create a json file with the required metadata.
Tasks