Book a demo

Cut patent&paper research from weeks to hours with PatSnap Eureka AI!

Try now

Quantized AI Models on Edge Chips — PatSnap Eureka

Quantized AI Models on Edge Chips — PatSnap Eureka
Edge AI · Patent Intelligence

Deploying Quantized AI Models on Industrial Edge Chips

A synthesis of 35+ patents mapping the full deployment pipeline — from float-to-fixed-point conversion and operator adaptation through heterogeneous hardware scheduling to adaptive model switching at runtime.

Quantization Deployment Pipeline: Float Training → Fixed-Point Conversion → Operator Adaptation → Heterogeneous Scheduling → Runtime Optimization Five-stage pipeline for deploying quantized AI models on industrial edge chips, synthesized from 35+ patents filed 2020–2026. Each stage must be co-designed with the target chip's arithmetic capabilities. FLOAT TRAINING PC Host FIXED-PT CONVERT PTQ / QAT OPERATOR ADAPT NPU / ASIC HW SCHED NPU/CPU DMA/DVFS RUNTIME ADAPT Dynamic Q Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Edge AI Deployment Pipeline Synthesized from 35+ patents · 2020–2026 35+ Patents Analysed CN · AU · PCT · 2020–2026
35+
Patents & technical disclosures analysed
4
Core technical theme clusters identified
2020–26
Filing period covered in corpus
CN·AU·PCT
Jurisdictions represented
Quantization Pipelines

From Floating-Point Training to Fixed-Point Deployment

The foundational step in deploying AI models on industrial edge chips is converting full-precision floating-point models into lower-precision integer representations that match the arithmetic capabilities of edge silicon. The canonical pipeline — train on PC, quantize, then transfer to embedded hardware — involves training a floating-point network on a PC host, converting it to a fixed-point embedded model using per-layer quantization formulas, preprocessing quantization data, and executing all accelerated operators in hardware mode on an embedded AI accelerator. The result is reduced model storage footprint, accelerated inference, improved compute density, and lower operational power consumption.

A more sophisticated variant is quantization-aware training (QAT), where quantization error is injected into the training graph before deployment. Hikvision's approach replaces each network layer with hardware-compatible target operator equivalents and conducts QAT using the edge device's underlying operator library — leveraging TorchScript as an intermediate representation and a lightweight engine such as PaddleSlim, without dependency on the full PyTorch training framework.

For ASIC-class chips where standard quantization schemes fail due to non-standard bitwidths, a two-stage format conversion approach applies a first-format conversion to the pretrained model, performs full-integer quantization at a configurable bit-width, then applies a second-format conversion to produce the final deployable model. This explicitly addresses the mismatch between academic quantization schemes and the constrained bitwidths of production AI silicon. Learn more about patent landscape analysis for edge AI on the PatSnap platform.

Post-training static quantization (PTQ) with sensitivity-guided mixed precision evaluates both pruning sensitivity and quantization sensitivity layer-by-layer, applies MinMax calibration to determine activation and weight ranges, and assigns lower bit-widths to insensitive blocks while preserving higher precision on sensitive modules using the PTQ formula: quantized weight = round(scale × weight + zero_point), clipped to the target bit range.

QAT
Quantization-Aware Training — error injected at training time
PTQ
Post-Training Quantization — sensitivity-guided mixed precision
INT8
Configurable bit-width for ASIC full-integer quantization
4→8
Dynamic bit-width switching triggered by runtime resource state
  • Reduced model storage footprint on edge devices
  • Accelerated inference via hardware-mode operators
  • Lower operational power consumption
  • MinMax calibration for activation and weight ranges
  • Two-stage format conversion for ASIC silicon
Patent Landscape Data

Technical Theme Distribution Across 35+ Edge AI Patents

Four dominant clusters identified in the 2020–2026 corpus: quantization pipelines, operator conversion, heterogeneous scheduling, and adaptive compression.

Core Technical Themes in Edge AI Quantization Patents

Patent count by dominant technical theme across the 35+ document corpus (2020–2026).

Core Technical Themes: QAT & PTQ Pipelines 10 patents, Operator Conversion 9 patents, Heterogeneous Scheduling 9 patents, Adaptive Compression 7 patents Distribution of 35+ patents across four dominant technical clusters in the quantized AI edge deployment corpus (2020–2026), analysed via PatSnap Eureka. QAT and PTQ pipelines lead with 10 patents, followed by operator conversion and heterogeneous scheduling each with 9. 10 8 6 4 2 10 QAT & PTQ 9 Operator Conv. 9 HW Scheduling 7 Adaptive Comp. Source: PatSnap Eureka · 35+ patent corpus · 2020–2026

Patent Assignees by Category

Industrial AI companies lead filings, followed by academic institutions and state-owned enterprises.

Patent Assignees by Category: Industrial AI Companies 37%, Academic Institutions 23%, State-Owned Enterprises 20%, Global Semiconductor 11%, Other 9% Breakdown of 35+ patent assignees across five categories in the quantized AI edge deployment corpus. Hikvision, OPPO, Changan Automobile, and China Mobile are among the industrial AI company assignees. Source: PatSnap Eureka. 35+ Patents Industrial AI (37%) Academic (23%) State-Owned (20%) Global Semi (11%) Other (9%) Source: PatSnap Eureka · Assignee analysis · 2020–2026

Explore the full patent landscape for edge AI quantization in PatSnap Eureka

Run a Live Patent Search
Operator Adaptation

Cross-Platform Format Conversion and Operator Substitution

Operator incompatibility is the most common deployment failure mode. These four patent-backed strategies systematically eliminate it.

Framework-Aware Training

Ethos-N NPU Operator Co-Design at Training Time

Shenzhen Unilumin's approach trains models using NPU-compatible operators in PyTorch, then converts parameters and reconstructs operator combinations to produce a model natively compatible with the NPU's TensorFlow-centric toolchain. Naive one-to-one operator mapping introduces redundant intermediate computation nodes, degrades quantization accuracy, and can produce incorrect inference results — problems eliminated by co-design at training time.

Arm Ethos-N NPU Ecosystem
Three-Stage Substitution

Hikvision Private Operator Substitution Workflow

A three-stage workflow: detect non-standard operators absent from the target intermediate format's base operator set, replace each with a functionally equivalent composition of base operators (termed "private operators"), convert to an intermediate format (QIR), then convert again to the target platform's native format, followed by per-operator quantization. This layered substitution eliminates unsupported operator errors while preserving quantization fidelity.

QIR Intermediate Format
Cross-Vendor Migration

Automatic Matching with Dynamic Verilog/HLS Synthesis

Xi'an Tengkun's low-code platform identifies the target hardware profile, selects quantization strategy based on model parameter magnitude, and converts models using standard tooling (e.g., TFLite Micro Interpreter). When no existing hardware profile matches, it dynamically generates Verilog/HLS code templates to synthesize compatible logic, establishes a global virtual address space, and implements RDMA-based zero-copy data transfer between NPU and GPU.

RDMA Zero-Copy Transfer
Chip-Aware Scheduling

Changan Automotive Operator-to-Compute-Unit Assignment

After quantization, operators in the quantized model are compiled against the edge chip's instruction set, assigned to specific compute units (e.g., CPU cores, NPU tiles), and scheduled with per-unit priority policies to maximize throughput given resource contention. This reflects the need to treat edge chip deployment not merely as a format conversion problem but as a real-time resource allocation problem.

Real-Time Resource Allocation
Patent Intelligence

Map operator compatibility gaps before your next chip migration

PatSnap Eureka indexes 35+ edge AI deployment patents with full operator-level technical detail.

Analyse Operator Patents in Eureka
Heterogeneous Hardware

Scheduling Quantized Models Across NPU, CPU, DSP, and FPGA

Industrial edge SoCs integrate multiple compute units with distinct instruction sets and latency profiles. These patents codify how to map quantized models across them efficiently.

Three-Layer Heterogeneous Scheduling Architecture

Xi'an Xingxun's framework assigns feature extraction operators to the NPU, offloads classification operators to the multi-core CPU, manages on-chip SRAM through a virtual memory paging mechanism, and uses a DMA controller to implement zero-copy data transfers — eliminating redundant memory copies between processing stages. The same framework applies structural pruning based on attention-head importance scores and dynamic feed-forward network sparsification driven by input tensor entropy values.

🔀

Qualcomm Hybrid Fixed/Floating-Point Inference-Training Pipeline

The ANN model runs forward inference entirely in fixed-point format on a DSP, while backward gradient computation is selectively routed to either the GPU (floating-point) or the DSP (fixed-point) depending on the measured loss magnitude. This enables on-chip continual learning without dedicated floating-point hardware paths for routine inference — critical for industrial edge devices that must adapt to distribution shift without cloud connectivity.

🔒
Unlock 2 More Scheduling Strategies
Including Beijing Kejie's dynamic bit-width switching framework and Gowin Semiconductor's MCU+FPGA operator dispatch — with full patent citations.
DVFS power matching Federated compensation MCU fallback logic
Explore Full Patent Detail →
Key Players

Who Is Driving Edge AI Quantization Innovation?

Hikvision (Hangzhou Hikvision Digital Technology) is the most prolific assignee in the corpus, contributing at least three distinct patents covering quantization-aware training pipelines, operator substitution for cross-platform model deployment, and sub-block quantization for inference acceleration on edge devices. Its innovations consistently target the full model lifecycle from cloud training through edge deployment. Explore Hikvision's patent portfolio via PatSnap Analytics.

Xi'an Xingxun Intelligent Communication Technology has filed two substantially identical patents (active, 2025) covering the combined structural pruning + dynamic sparsification + mixed-precision quantization + heterogeneous scheduling deployment pipeline, signaling a focus on end-to-end compression-to-deployment automation for complex industrial scenes.

Qualcomm contributes the only PCT-family patent in this corpus, covering on-device hybrid precision inference-training pipelines using heterogeneous GPU/DSP hardware — reflecting a global standardization interest in fixed-point/floating-point co-execution architectures for mobile and industrial edge SoCs. This aligns with WIPO's growing body of edge AI PCT filings.

China Mobile Research Institute addresses the storage-compute integration dimension through model weight deployment on processing-in-memory chips, automatically mapping neural network weight matrices to idle crossbar arrays in compute-in-memory (CiM) chips — a hardware approach that avoids the von Neumann memory bottleneck entirely.

Academic contributors — including Wuhan University, Tongji University, Chongqing University, and Chongqing University of Posts and Telecommunications — contribute foundational compression and deployment methodology that industrial assignees translate into product-level implementations. PatSnap's materials and engineering solutions surface these academic-to-industry technology transfer patterns across sectors.

China Railway High-Tech Industry and China Southern Power Grid represent industrial adopters independently developing domain-specific edge deployment methods for fault diagnosis and power infrastructure use cases, reflecting sectoral urgency around industrial AI edge deployment. The IEEE has published extensively on the reliability requirements driving these safety-critical deployments.

Top Assignees in Corpus
Hikvision 3+ patents
Xi'an Xingxun 2 patents
Qualcomm PCT filing
China Mobile CiM focus
China Railway Industrial
Emerging Trend: Adaptive & Self-Healing Deployment

Multiple 2025 patents introduce mechanisms for the edge device to monitor its own inference quality and resource consumption at runtime, then autonomously adjust quantization levels, trigger cloud offload, or switch model versions.

Key Takeaways

Seven Deployment Principles from 35+ Patents

Distilled from the full patent corpus — actionable guidance for R&D engineers and embedded AI architects.

Pipeline Design

Quantization Is a Multi-Stage Workflow, Not a Single Step

From float-to-fixed conversion through hardware-mode operator acceleration, each stage must be co-designed with the target chip's arithmetic capabilities. The canonical pipeline involves training, per-layer conversion, data preprocessing, and hardware-mode execution on an embedded AI accelerator.

Shanghai Qigan · Zhuhai Yizhi
Failure Mode

Operator Incompatibility Is the Most Common Deployment Failure

Detecting unsupported operators, substituting them with hardware-compatible equivalents, and ensuring format conversion does not introduce redundant compute nodes is essential. Hikvision's private operator substitution and Unilumin's framework-aware training approach both address this systematically.

Hikvision · Shenzhen Unilumin
Precision Strategy

Mixed-Precision Outperforms Uniform Quantization

Assigning lower bit-widths to insensitive layers and higher bit-widths to sensitive ones — guided by structured sensitivity analysis — preserves accuracy while maximizing compression. Tongji University's end-to-end sensitivity-guided PTQ method and Xi'an Xingxun's dynamic mixed-precision pipeline both demonstrate this principle.

Tongji University · Xi'an Xingxun
Hardware Utilization

Heterogeneous Scheduling Is Essential for Full Chip Utilization

Feature extraction operators should be routed to NPU/FPGA fabric, while classification and control operators are better suited to multi-core CPUs. This hardware-aware partitioning also enables zero-copy DMA data transfers, as codified by Xi'an Xingxun and Qualcomm's hybrid fixed/floating-point inference-training pipeline.

Xi'an Xingxun · Qualcomm
🔒
Unlock 3 More Deployment Principles
Including dynamic bit-width switching, device fingerprinting for cross-platform portability, and knowledge distillation with cloud-edge fallback.
DVFS bit-width switching Device fingerprint vector Distillation + cloud fallback
Access Full Analysis in Eureka →

Map your edge deployment strategy against the full patent corpus

PatSnap Eureka provides AI-powered search across all 35+ patents in this analysis.

Search Edge AI Patents
Frequently asked questions

Deploying Quantized AI on Edge Chips — Key Questions Answered

Still have questions? Let PatSnap Eureka search the patent corpus for you.

Ask Eureka About Edge AI Patents
PatSnap Eureka

Accelerate Your Edge AI R&D with Patent Intelligence

Join 18,000+ innovators already using PatSnap Eureka to map quantization pipelines, identify operator compatibility gaps, and monitor competitor filings across NPU, ASIC, and FPGA edge platforms.

References

  1. Neural Network Model Real-Time Automatic Quantization Method and System — Shanghai Qigan Electronic Information Technology, 2021
  2. Neural Network Model Real-Time Automatic Quantization Method and System (Updated) — Shanghai Qigan Electronic Information Technology, 2024
  3. Model Training Method and Apparatus (QAT for Edge Image Processors) — Hangzhou Hikvision Digital Technology, 2024
  4. Model Deployment Method, Device, Apparatus, Chip, and Storage Medium (ASIC Full-Integer Quantization) — Zhuhai Yizhi Electronics, 2020
  5. A Single-Chip Computational Imaging Edge Reconstruction Method Based on End-to-End Sensitivity Analysis — Tongji University, 2025
  6. Model Deployment Method, Apparatus, Device, and Program Product (Ethos-N NPU Operator Co-Design) — Shenzhen Unilumin Technology, 2025
  7. A Model Deployment Method and Apparatus (Private Operator Substitution) — Hangzhou Hikvision Digital Technology, 2024
  8. Cross-Hardware Platform AI Algorithm Model Automatic Matching and Migration Method and System — Xi'an Tengkun Electronics, 2025
  9. AI Large Model Lightweight Deployment Method for Complex Scenes (I) — Xi'an Xingxun Intelligent Communication Technology, 2025
  10. AI Large Model Lightweight Deployment Method for Complex Scenes (II) — Xi'an Xingxun Intelligent Communication Technology, 2025
  11. On-Device Unified Inference-Training Pipeline of Hybrid Precision Forward-Backward Propagation — Qualcomm Incorporated, 2025
  12. Dynamic Model Switching Framework for AI Inference Optimization on Edge Devices — Beijing Kejie Technology, 2025
  13. AI Model Deployment and AI Computing Method, System (SoC MCU+FPGA Operator Dispatch) — Guangdong Gowin Semiconductor, 2024
  14. Inference Acceleration Method and Apparatus for Edge Devices (Sub-Block Quantization) — Hangzhou Hikvision Digital Technology, 2025
  15. Model Deployment Method, Apparatus, Device, Storage Medium, and Program Product — Chongqing Changan Automobile, 2025
  16. Model Weight Deployment Method on Processing-in-Memory Chips — China Mobile Research Institute, 2024
  17. An AI Model Adaptive Deployment Method, Apparatus, Device, and Medium (Device Fingerprint Vector) — State Grid Henan Information Communication, 2025
  18. Intelligent Edge Computing Platform with Machine Learning Capability — Fog Horn Systems, 2021
  19. Robot AI Model Dynamic Compression Method and Control System — Shanghai Sazhi Intelligent Technology, 2025
  20. Lightweight Deployment Method and System for Industrial Scene Large Models — China Railway High-Tech Industry, 2025
  21. Meat Quality Detection Model Compression Method and System Based on Edge Computing — Shandong Ruicheng Data Technology, 2025
  22. WIPO — World Intellectual Property Organization: PCT Filing Data and Edge AI Patent Trends
  23. IEEE — Institute of Electrical and Electronics Engineers: Edge AI and Embedded Systems Publications
  24. PyTorch — TorchScript Intermediate Representation Documentation

All patent data and technical claims on this page are sourced from the references above and from PatSnap's proprietary innovation intelligence platform. Patent analysis conducted via PatSnap Eureka.

Ask PatSnap Eureka
Ask PatSnap Eureka
AI innovation intelligence · always on
Ask anything about quantized AI on edge chips.
PatSnap Eureka searches patents and research to answer instantly.
Try asking
Powered by PatSnap Eureka