-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to share data between multiple transformer instances? #825
Comments
Definitely you can have the transformer instances share the map state. E.g., store the map as part of a companion object. Storing the map only once in the bundle file is trickier. I think it can be done but would be kind of ugly. Maybe something like add+store a "shouldWriteMap" parameter on the transformer which you set to true on exactly one instance of the transformer in your pipeline. It might be easier to just store the map multiple times within the bundle. Another option which you could consider to make your transformer be multiple input/output so that you only need to use the transformer one time in your pipeline. |
thanks @jsleight for quick response! the map is not small compared with the overall bundle file, the most size portion is due to the map duplicates.
this is a good idea that i can use to reduce memory footprint, I could keep a another map to store different embeddings and keep each one for exact one copy, and then load the duplicated instance, just point to the map in the object. adding a Update the transformer to be multiple input/output is also one solution, but I may prefer to see if I could update the serialization/deserialization internally to achieve the goal as they are already a bunch of clients code using the custom transformer. |
I checked around the mleap code, it seems I can customize the single transformer serialization with |
Yeah to my knowledge mleap APIs don't really have a good mechanism for storing global state in the bundle. Though I wouldn't be opposed to adding such capabilities if you want to submit a PR. Perhaps by adding new APIs for writeGlobal and readGlobal (or something like that) which the Ops can use. Probably we would need to rely on transformers providing unique keys in the global bundle namespace, but I think that should be acceptable. |
Hello mleap experts,
I have built a custom transformer which maps a key to vector with a Map, but the scale is not small ~100K, the custom transformer is used multiple times in same mleap pipeline, they are serialized separately causing the underling Map duplicated. I am wondering if it's possible that multiple transformer instance share the same underlying data, so that I could only store one copy in bundle file, and store one copy in memory shared by multiple instances.
The text was updated successfully, but these errors were encountered: