-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Store compressed model definitions in ByteReferences #71679
[ML] Store compressed model definitions in ByteReferences #71679
Conversation
Pinging @elastic/ml-core (Team:ML) |
7714b2e
to
6dd2c1e
Compare
String compressedString = docs.stream() | ||
.map(TrainedModelDefinitionDoc::getCompressedString) | ||
.collect(Collectors.joining()); | ||
// BWC for when we tracked the total definition length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
total definition length is now tracked again as we need to know the size of PyTorch models up front
|
||
for (int i=0; i<numDocs; i++) { | ||
// actual content of the model definition is not important here | ||
modelDefs.add("model_def_" + i); | ||
modelDefs.add(new BytesArray(Base64.getEncoder().encode(("model_def_" + i).getBytes(StandardCharsets.UTF_8)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to have a private method BytesArray base64Encode(String)
and use it throughout
BytesStreamOutput out = new BytesStreamOutput(); | ||
try (OutputStream compressedOutput = new GZIPOutputStream(out, BUFFER_SIZE)) { | ||
reference.writeTo(compressedOutput); | ||
} | ||
return new String(Base64.getEncoder().encode(BytesReference.toBytes(out.bytes())), StandardCharsets.UTF_8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so this will write out the raw bytes of the GZIP. Is this what we want or do we want to run the Base64 encoder?
I thought the guarantees around base64 character sizes was one of the reasons we could skip transforming into a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Binary data is stored in lucene base64 encoded,
Ah, so since the mapping is binary
we get that for free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's handled by the Jackson JSON generator which is used by the various XContentBuilder::value(byte[] value)
methods to write bytes
String compressedString = trainedModelConfig.getCompressedDefinition(); | ||
if (compressedString.length() > MAX_COMPRESSED_STRING_SIZE) { | ||
BytesReference compressedDefinition = trainedModelConfig.getCompressedDefinition(); | ||
if (compressedDefinition.length() > MAX_COMPRESSED_STRING_SIZE) { | ||
listener.onFailure( | ||
ExceptionsHelper.badRequestException( | ||
"Unable to store model as compressed definition has length [{}] the limit is [{}]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to indicate that the length is in UTF-8 bytes.
EDIT: Well, maybe not utf-8 bytes...but bytes or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the fact the size is in bytes to the message in 7a59661
.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java
Outdated
Show resolved
Hide resolved
.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java
Outdated
Show resolved
Hide resolved
The feature branch contains changes to configure PyTorch models with a TrainedModelConfig and defines a format to store the binary models. The _start and _stop deployment actions control the model lifecycle and the model can be directly evaluated with the _infer endpoint. 2 Types of NLP tasks are supported: Named Entity Recognition and Fill Mask. The feature branch consists of these PRs: #73523, #72218, #71679 #71323, #71035, #71177, #70713
Binary data is stored in lucene base64 encoded, the same data stored in a Java string uses 2 bytes (UTF16) to represent each base64 character consuming twice the amount of memory required. This change uses ByteReferences to hold the binary data which is not base64 encoded, encoding must take place the bytes can be persisted and this is performed by the XContent classes.
For BWC I've added a new field mapping
binary_definition
to .ml-inference-* which means the index version has to be incremented.Compatibility for HLRC and REST API users is preserved as the
compressed_definition
field still contains the base64 encoded rep.