-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Storing TorchScript models #70827
Comments
Pinging @elastic/ml-core (Team:ML) |
Implementation ConsiderationsCode review of the first implementation in #71323 has raised a number of questions, many relating to index management
Where should the models be stored?1. User designated Index?Perhaps the most flexible solution from an end users perspective and would allow direct upload to the index as there are no of the access restrictions that come with system indices. For obvious performance reasons the binary blobs the models consist of should not be analysed or indexed for search, since ml does not own the index the burden of defining the correct index mappings falls on the user who must also be aware that the requirement exists. 2. Store in a hidden Index?Offers some protection against accidental deletion or corruption as the index will not appear in wild card expansion and the user still has direct access to PUT documents in the index. The index and mappings would be managed by ml. The precedent is the 3. A System Index?Write access is limited to ml APIs which would preclude uploading the model state directly to index. A new ml REST endpoint would be required for uploading models which is actually a desirable outcome as validation can be performed. 4. .ml-inference-* ?This system index does not currently have a ILM policy for size based rollover and is mainly used for config. Is mixing config and large binary blobs wise? |
How does it complicate model management? I imagine the model index is a single shard index with rollover when it reaches a certain size. So, this already pushes docs of the same model into the same index. It is, of course, possible to have multiple models being stored concurrently, which could result in individual shards hosting docs for multiple models. The complication seems to be deleting models. We have had trouble doing this in the past as well. The delete-by-query request becomes too expensive. The only way I know to make this fast is to keep a model into its own index (or index pattern). We could consider a hybrid strategy where we have a shared index pattern for models that are small (we need to define some threshold) and we force bigger models to go into their own index pattern. |
In retrospect not significantly. The design relies on model IDs being unique which would normally be enforced by using unique document IDs but that will not work over multiple indices so an up front check is required before PUT. There are various bits of house keeping we would have to implement but that is not a huge amount of code. The current implementation for boosted tree models does not work with multiple indices, amending that will be the bulk of the work.
Interesting 🤔 |
We have a strategy that we are using for now. Shall we close this or are there open discussion still? |
Closed by #71323 |
TorchScript models are compressed binary files where the size is dependent on purpose but for neural networks models a size larger than 1GB is not unusual. Storing such a large model as a single document in Elasticsearch would be sub-optimal as it requires contiguous memory. Additionally uploading a document larger than the HTTP max content length limit is infeasible.
Better performance will be found by splitting the model into chunks and streaming those chunks to the native process to be reassembled on use. ml already uses this pattern for Anomaly Detection job state. AD uses 16MB chunks but it is worth benchmarking smaller chunk sizes. Because the model is binary data it must be base64 encoded then it can be stored in a Binary field or an index with mappings disabled.
Once the model is split into constituent chunks a meta-document will track the number of chunks, the IDs of the documents containing model chunks will follow a predictable naming convention. One vital piece of information required by the native process is the size of the model in bytes. This must be sent before any chunks are streamed, this value should be stored in the meta document as reading all the chunks into a buffer to calculate the un-base64 encoded size defeats the purpose of streaming.
TODO
byte []
?The text was updated successfully, but these errors were encountered: