You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The design has to accommodate multiple concurrent inference requests to the server and a mechanism to tie a specific request to a model output is required. This could be inferred from the processing order which is strictly FIFO but adding a ID token to each request provides additional context for development and debugging. The token has no semantics and is purely passed through the C++. Anomaly Detector flushID is the prior art here.
Payload
The inference payload is a series of numeric tokens. An individual inference request will consist of the request ID, the payload tokens and a marker to delineate each request.
Anomaly Detection uses a concise length encoded binary protocol because of the high volume of data sent across pipes. Compared with Anomaly Detection the input is small so a more verbose input format can be used which has the advantage of being descriptive.
It might be worth saying the input format is ND-JSON rather than arbitrary JSON. Then each input document or command document can be one line of a text file.
We have functionality for parsing a stream of arbitrarily formatted JSON documents separated by \0 characters, but this is not a friendly format for testing at the command line using simple text files. So ND-JSON is probably a better format for the long term.
Request ID
The design has to accommodate multiple concurrent inference requests to the server and a mechanism to tie a specific request to a model output is required. This could be inferred from the processing order which is strictly FIFO but adding a ID token to each request provides additional context for development and debugging. The token has no semantics and is purely passed through the C++. Anomaly Detector flushID is the prior art here.
Payload
The inference payload is a series of numeric tokens. An individual inference request will consist of the request ID, the payload tokens and a marker to delineate each request.
Anomaly Detection uses a concise length encoded binary protocol because of the high volume of data sent across pipes. Compared with Anomaly Detection the input is small so a more verbose input format can be used which has the advantage of being descriptive.
Input Format
A JSON document:
token_ids
andattention_mask
are required for all uses,token_type_ids
andposition_ids
are optional depending on the model type.Output Format
A JSON document for flexibility containing the request ID token, the result tensor and optionally the predicted tokens depending on the model type:
The text was updated successfully, but these errors were encountered: