OpenVINO 2022.3之九:Post-training Optimization Tool (POT)

mingo_敏 2024-06-19 12:01:01
简介OpenVINO 2022.3之九:Post-training Optimization Tool (POT)

OpenVINO 2022.3之九:Post-training Optimization Tool (POT)

Post-training Optimization Tool (POT) 通过在已训练好的模型上应用量化算法,将模型的权重和激活函数从 FP32/FP16 的值域映射到 INT8 的值域中,从而实现模型压缩,以降低模型推理所需的计算资源和内存带宽,进一步提高模型的推理性能。不同于 Quantization-aware Training(QAT) 方法,POT在不需要对原模型进行 fine-tuning 的情况下进行量化,也能得到精度较好的 INT8 模型,因此广泛地被应用于工业界的量化实践中。


POT提供了两种量化算法: Default QuantizationAccuracy-aware Quantization

  • Default Quantization (DQ) 提供了一种快速的量化方法,量化后的模型在大多数情况下能够提供较好的精度,适合作为模型 INT8 量化的 baseline。

        "name": "DefaultQuantization", # Optimization algorithm name
        "params": {
            "preset": "performance", # Preset [performance, mixed] which controls
                                     # the quantization scheme. For the CPU:
                                     # performance - symmetric quantization  of weights and activations.
                                     # mixed - symmetric weights and asymmetric activations.
            "stat_subset_size": 300  # Size of subset to calculate activations statistics that can be used
                                     # for quantization parameters calculation.
  • Accuracy-aware Quantization ( AAQ ) 是一种基于 Default Quantization 上的迭代量化算法,以 DQ 量化后的模型作为 baseline,若 INT8 模型精度达到预期精度范围,则停止迭代,反之,量化算法会分析模型各层对精度的影响,并将对精度影响最大的层回退到FP32精度,然后重新评估模型精度,重复以上流程,直至模型达到预期精度范围。

        "name": "DefaultQuantization",
        "params": {
            "preset": "performance",
            "stat_subset_size": 300
            "activations": {                 # defines activation
                "range_estimator": {         # defines how to estimate statistics
                    "max": {                 # right border of the quantizating floating-point range
                        "aggregator": "max", # use max(x) to aggregate statistics over calibration dataset
                        "type": "abs_max"    # use abs(max(x)) to get per-sample statistics

1 Default Quantization (DQ)

提供了 POT 量化流水线通用化的接口,包括 DataLoader 和 Metric 等基类,用户可以通过继承 DataLoader 来定义客制化的数据集加载及预处理模块,通过继承 Metric 来定义客制化的后处理和精度计算的模块。这种方式更加灵活,可以适用不同客制化模型的量化需求.

1-1 Prepare data and dataset interface

import os
import numpy as np
import cv2 as cv

from openvino.tools.pot import DataLoader

class ImageLoader(DataLoader):
    """ Loads images from a folder """
    def __init__(self, dataset_path):
        # Use OpenCV to gather image files
        # Collect names of image files
        self._files = []
        all_files_in_dir = os.listdir(dataset_path)
        for name in all_files_in_dir:
            file = os.path.join(dataset_path, name)
            if cv.haveImageReader(file):

        # Define shape of the model
        self._shape = (224,224)

    def __len__(self):
        """ Returns the length of the dataset """
        return len(self._files)

    def __getitem__(self, index):
        """ Returns image data by index in the NCHW layout
        Note: model-specific preprocessing is omitted, consider adding it here
        if index >= len(self):
            raise IndexError("Index out of dataset size")

        image = cv.imread(self._files[index]) # read image with OpenCV
        image = cv.resize(image, self._shape) # resize to a target input size
        image = np.expand_dims(image, 0)  # add batch dimension
        image = image.transpose(0, 3, 1, 2)  # convert to NCHW layout
        return image, None   # annotation is set to None

1-2 Select quantization parameters

    "name": "DefaultQuantization",
    "params": {
        "target_device": "ANY",
        "stat_subset_size": 300,
        "stat_batch_size": 1

1-3 Define and run quantization process

from openvino.tools.pot import IEEngine
from openvino.tools.pot load_model, save_model
from openvino.tools.pot import compress_model_weights
from openvino.tools.pot import create_pipeline

# Model config specifies the name of the model and paths to .xml and .bin files of the model.
model_config =
    "model_name": "model",
    "model": path_to_xml,
    "weights": path_to_bin,

# Engine config.
engine_config = {"device": "CPU"}

algorithms = [
        "name": "DefaultQuantization",
        "params": {
            "target_device": "ANY",
            "stat_subset_size": 300,
            "stat_batch_size": 1

# Step 1: Implement and create a user data loader.
data_loader = ImageLoader("<path_to_images>")

# Step 2: Load a model.
model = load_model(model_config=model_config)

# Step 3: Initialize the engine for metric calculation and statistics collection.
engine = IEEngine(config=engine_config, data_loader=data_loader)

# Step 4: Create a pipeline of compression algorithms and run it.
pipeline = create_pipeline(algorithms, engine)
compressed_model = pipeline.run(model=model)

# Step 5 (Optional): Compress model weights to quantized precision
#                     to reduce the size of the final .bin file.

# Step 6: Save the compressed model to the desired path.
# Set save_path to the directory where the model should be saved.
compressed_model_paths = save_model(

2 Accuracy-aware Quantization ( AAQ )

2-1 Prepare data and dataset interface

同 2-1-1.

2-2 Define accuracy metric

from openvino.tools.pot import Metric

class Accuracy(Metric):

    # Required methods
    def __init__(self, top_k=1):
        self._top_k = top_k
        self._name = 'accuracy@top{}'.format(self._top_k)
        self._matches = [] # container of the results

    def value(self):
        """ Returns accuracy metric value for all model outputs. """
        return {self._name: self._matches[-1]}

    def avg_value(self):
        """ Returns accuracy metric value for all model outputs. """
        return {self._name: np.ravel(self._matches).mean()}

    def update(self, output, target):
        """ Updates prediction matches.
        :param output: model output
        :param target: annotations
        if len(output) > 1:
            raise Exception('The accuracy metric cannot be calculated '
                            'for a model with multiple outputs')
        if isinstance(target, dict):
            target = list(target.values())
        predictions = np.argsort(output[0], axis=1)[:, -self._top_k:]
        match = [float(t in predictions[i]) for i, t in enumerate(target)]


    def reset(self):
        """ Resets collected matches """
        self._matches = []

    def get_attributes(self):
        Returns a dictionary of metric attributes {metric_name: {attribute_name: value}}.
        Required attributes: 'direction': 'higher-better' or 'higher-worse'
                             'type': metric type
        return {self._name: {'direction': 'higher-better',
                             'type': 'accuracy'}}


metric = Accuracy()
engine = IEEngine(config=engine_config, data_loader=data_loader, metric=metric)

2-3 Select quantization parameters

Accuracy-aware Quantization所独有的唯一参数是 maximal_drop, 表明模型量化后必须实现的最大降低精度,默认是0.01(1%)

2-4 Define and run quantization process

from openvino.tools.pot import IEEngine
from openvino.tools.pot load_model, save_model
from openvino.tools.pot import compress_model_weights
from openvino.tools.pot import create_pipeline

# Model config specifies the model name and paths to model .xml and .bin file
model_config = Dict(
        "model_name": "model",
        "model": path_to_xml,
        "weights": path_to_bin,

# Engine config
engine_config = Dict({"device": "CPU"})

algorithms = [
        "name": "AccuracyAwareQuantization",
        "params": {
            "target_device": "ANY",
            "stat_subset_size": 300,
            'maximal_drop': 0.02

# Step 1: Implement and create user's data loader.
data_loader = UserDataLoader()

# Step 2: Implement and create user's data loader.
metric = Accuracy()

# Step 3: Load the model.
model = load_model(model_config=model_config)

# Step 4: Initialize the engine for metric calculation and statistics collection.
engine = IEEngine(config=engine_config, data_loader=data_loader, metric=metric)

# Step 5: Create a pipeline of compression algorithms and run it.
pipeline = create_pipeline(algorithms, engine)
compressed_model = pipeline.run(model=model)

# Step 6 (Optional): Compress model weights to quantized precision
#                    in order to reduce the size of the final .bin file.

# Step 7: Save the compressed model to the desired path.
# Set save_path to the directory where the model should be saved.
compressed_model_paths = save_model(

# Step 8 (Optional): Evaluate the compressed model. Print the results.
metric_results = pipeline.evaluate(compressed_model)