Skip to content

coremltools 8.0b2

Pre-release
Pre-release
Compare
Choose a tag to compare
@jakesabathia2 jakesabathia2 released this 16 Aug 01:02
· 42 commits to main since this release
5e2460f

Release Notes

  • Support for Latest Dependencies
    • Compatible with the latest protobuf python package: Improves serialization latency.
    • Compatible with numpy 2.0.
    • Supports scikit-learn 1.5.
  • New Core ML model utils
    • coremltools.models.utils.bisect_model can break a large Core ML model into two smaller models with similar sizes.
    • coremltools.models.utils.materialize_dynamic_shape_mlmodel can convert a flexible input shape model into a static input shape model.
  • New compression features in coremltools.optimize.coreml
    • Vector palettization: By setting cluster_dim > 1 in coremltools.optimize.coreml.OpPalettizerConfig, you can do the vector palettization, where each entry in the lookup table is a vector of length cluster_dim.
    • Palettization of per channel scale: By setting enable_per_channel_scale=True in coremltools.optimize.coreml.OpPalettizerConfig, weights are normalized along the output channel using per channel scales before being palettized.
    • Joint compression: A new pattern is supported, where weights are first quantized to int8 and then palettized into n-bit look-up table with int8 entries.
    • Support conversion of palettized model with 8bits LUT produced from coremltools.optimize.torch.
  • New compression features / bug fixes in coremltools.optimize.torch
    • Added conversion support for Torch models jointly compressed using the training time APIs in coremltools.optimize.torch .
    • Added vector palettization support to SKMPalettizer .
    • Fixed bug in construction of weight vectors along output channel for vector palettization with PostTrainingPalettizer and DKMPalettizer .
    • Deprecated cluter_dtype option in favor of lut_dtype in ModuleDKMPalettizerConfig .
    • Added support for quantizing ConvTranspose modules with PostTrainingQuantizer and LinearQuantizer .
    • Added static grouping for activation heuristic in GPTQ.
    • Fixed bug in how quantization scales are computed for Conv2D layer with per-block quantization in GPTQ .
    • Can now perform activation only quantization with QAT APIs.
  • Experimental torch.export conversion support
    • Support conversion of stateful models with mutable buffer.
    • Support conversion of dynamic inputs shape models.
    • Support conversion of 4-bit weight compression models.
  • Support new torch ops: clip .
  • Various other bug fixes, enhancements, clean ups and optimizations.
  • Special thanks to our external contributors for this release: @dpanshu , @timsneath , @kasper0406 , @lamtrinhdev , @valfrom

Appendix

  • Example code of converting stateful torch.export model
import torch
import coremltools as ct

class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.register_buffer("state_1", torch.tensor([0.0, 0.0, 0.0]))

    def forward(self, x):
        # In place update of the model state
        self.state_1.mul_(x)
        return self.state_1 + 1.0

source_model = Model()
source_model.eval()

example_inputs = (torch.tensor([1.0, 2.0, 3.0]),)
exported_model = torch.export.export(source_model, example_inputs)
coreml_model = ct.convert(exported_model, minimum_deployment_target=ct.target.iOS18)
  • Example code of converting torch.export models with dynamic input shapes
import torch
import coremltools as ct

class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.linear = torch.nn.Linear(3, 5)

    def forward(self, x):
        y = self.linear(x)
        return y

source_model = Model()
source_model.eval()

example_inputs = (torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]),)
dynamic_shapes = {"x": {0: torch.export.Dim(name="batch_dim")}}
exported_model = torch.export.export(source_model, example_inputs, dynamic_shapes=dynamic_shapes)
coreml_model = ct.convert(exported_model)
  • Example code of converting torch.export with 4-bit weight compression
import torch
from torch._export import capture_pre_autograd_graph
from torch.ao.quantization.quantize_pt2e import convert_pt2e, prepare_pt2e
from torch.ao.quantization.quantizer.xnnpack_quantizer import (
    XNNPACKQuantizer,
    get_symmetric_quantization_config,
)
import coremltools as ct

class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.linear = torch.nn.Linear(3, 5)
    def forward(self, x):
        y = self.linear(x)
        return y

source_model = Model()
source_model.eval()

example_inputs = (torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]),)

pre_autograd_graph = capture_pre_autograd_graph(source_model, example_inputs)
quantization_config = get_symmetric_quantization_config(weight_qmin=-7, weight_qmax=8)
quantizer = XNNPACKQuantizer().set_global(quantization_config)
prepared_graph = prepare_pt2e(pre_autograd_graph, quantizer)
converted_graph = convert_pt2e(prepared_graph)

exported_model = torch.export.export(converted_graph, example_inputs)
coreml_model = ct.convert(exported_model, minimum_deployment_target=ct.target.iOS17)