Book a demo

Cut patent&paper research from weeks to hours with PatSnap Eureka AI!

Try now

Grasp planning for bin picking: 50+ patent insights

Model-Based vs. Learning-Based Grasp Planning for Bin Picking — PatSnap Insights
Robotics & Automation

Model-based grasp planning delivers deterministic, physically interpretable results for known parts — but breaks down the moment an unseen object enters the bin. Learning-based systems generalise broadly yet carry hidden costs: simulation infrastructure, sim-to-real transfer, and ensemble uncertainty management. A patent analysis of 50+ filings from FANUC, ABB, Bosch, Siemens, and leading universities reveals why the real answer for industrial deployment is neither paradigm alone.

PatSnap Insights Team Innovation Intelligence Analysts 14 min read
Share
Reviewed by the PatSnap Insights editorial team ·

How model-based grasp planning works — and where it hits its limits

Model-based grasp planning for bin picking is grounded in the availability of known object geometry — typically a CAD model or surface mesh — which is matched against sensor data to estimate a 6D object pose, after which stable grasp configurations are computed analytically. The core assumption is that a high-fidelity model of the target workpiece exists and that the robot’s task is to estimate where that object is, then apply pre-computed grasp strategies to it.

50+
Patents analysed across both paradigms
6-DOF
Grasp prediction dimensions addressed by modular neural networks
2
Separate networks in FANUC’s modular decomposition (position + rotation)
8+
Major industrial assignees filing in this space

FANUC’s adaptive grasp planning patents provide a canonical example. The system analyses workpiece shape to identify multiple robust grasp options with specified positions and orientations, then evaluates each individual workpiece in the bin to identify the feasible grasp set. When a direct motion to the goal pose is impossible, the system formulates an explicit search problem over stable intermediate poses, evaluating each link between nodes for feasibility based on collision-avoidance constraints and robot joint constraints. This deterministic, graph-search-based strategy is tightly coupled to a known part model; it cannot operate on unseen object categories.

ABB’s perception-based adaptive motion planning system represents a more recent model-based architecture. As detailed in ABB Schweiz AG’s 2025 filing, 6D pose determination is performed using a pose determination model fed by multi-image capture of the bin. A CAD model for each object is then used to generate a 3D rendering at the estimated pose, and a 2D picking-direction mask is generated to plan the motion. The reliance on CAD models per object class is characteristic of the model-based approach — the system’s applicability is bounded by the catalogue of available models.

What is a Grasp Quality Score (GQS)?

TATA Consultancy Services’ point cloud-based framework computes a Grasp Quality Score (GQS) by generating grasp poses in a random configuration, computing depth difference values per pixel for each sampled pose, and generating binary maps to obtain feasible subregions. The GQS is an explicit analytical criterion — not a learned representation — sitting at the boundary between model-based and learning-based approaches.

Grasp quality in model-based systems is computed analytically. The traditional approach — described in the background of Hangzhou Jiazhhi Technology’s 2017 filing — uses ε-metrics or convex-hull volume metrics derived from 6D wrench contact information computed in simulation environments like GraspIt or OpenRAVE. These metrics have known physical interpretability but require complete contact geometry from a model.

A fundamental limitation of pure model-based approaches surfaces in truly unstructured environments: they typically cannot handle novel or unseen objects. According to WIPO filing trends, the volume of patents addressing this limitation has grown substantially as manufacturers move toward flexible, mixed-SKU picking lines — a deployment scenario where pre-registered object models are simply not available for every item in the bin.

Model-based grasp planning for unstructured bin picking requires a 3D CAD model or surface mesh for each target object class. Grasp candidates are computed relative to this known model, making the approach deterministic and physically interpretable but incapable of handling novel or unseen objects without operator intervention.

Figure 1 — Model-based grasp planning pipeline for unstructured bin picking
Model-based grasp planning pipeline for unstructured bin picking — five sequential steps CAD Model 6D Pose Estimation Grasp Candidate Gen. Analytical Quality Score Robot Execution
Model-based systems follow a deterministic five-step pipeline anchored by a per-object CAD model; every downstream step depends on the quality and availability of that model.

Learning-based grasp planning: neural networks, synthetic data, and the sim-to-real challenge

Learning-based grasp planning replaces explicit model matching and analytical grasp synthesis with trained neural networks that map raw sensor data — typically depth images, point clouds, or RGB-D data — directly to predicted grasp poses and associated success probabilities. The central advantage is generalisation: a well-trained model can propose grasps for objects it has never explicitly been programmed to handle.

FANUC’s grasp learning pipeline illustrates the canonical sim-to-real learning pipeline. The process begins with a database of solid or surface models for all objects and grippers, performs iterative optimisation to compute hundreds of grasps per part using surface contact geometry, maps those grasps into simulated bin pile scenarios, and then correlates simulation results with camera depth image data to train neural networks for real-world grasp execution. Critically, the simulation-generated grasp points and approach directions become the supervised training signal for the network — the learned model then generalises across geometries it was not explicitly optimised for during inference.

“A fully automated and reliable picking of a diverse range of unseen objects in clutter is a challenging problem” — TATA Consultancy Services, framing the core motivation for geometry-driven, model-free grasp frameworks.

Neural network architecture choices significantly affect learning-based system performance. FANUC’s modular learning approach addresses the high-dimensional action space challenge by decomposing the 6-DOF grasp prediction problem into a first network encoding grasp position dimensions and a second network encoding rotation dimensions. The authors explicitly identify that single-network approaches for 6-DOF grasping suffer from high search complexity, require post-hoc grasp refinement, and struggle with cluttered environments — motivating the modular decomposition that reduces each network’s search space to a sum rather than a product of evaluated positions and rotations.

Explore the full patent landscape for robotic grasp planning in PatSnap Eureka — search, filter, and analyse filings from FANUC, ABB, Bosch, and 50+ assignees.

Explore Grasp Planning Patents in PatSnap Eureka →

Henan University’s sparse convolutional neural network approach demonstrates how learning-based methods can exploit point cloud geometry — incorporating surface curvature and normal information into the PointGrasp-Net architecture — and process only valid 3D scene points rather than the entire point cloud space. The model is trained in PyBullet simulation using randomly generated grasp poses on surface point clouds of randomly placed objects, explicitly designed for non-structured, random unordered grasping scenes without dependence on object shape or structure.

Robert Bosch GmbH’s ensemble prediction approach trains multiple prediction models and fuses their outputs via a mixture model to generate a combined pick prediction. The system explicitly acknowledges that each individual model’s quality depends on input-training data similarity, applicability to different object categories, and sensitivity to image noise — and that combining model outputs mitigates these individual failure modes. This is a learning-specific challenge with no direct analogue in model-based systems, and reflects the kind of uncertainty management that organisations like IEEE have highlighted as a core open problem in deployed robotic learning systems.

Northeastern University’s sim-to-real transfer method for robotic arm visual grasping applies visual domain randomisation — varying camera pose, image brightness, saturation, contrast, and Gaussian noise during training — so that the deployed model handles real-world perceptual variation not present during simulation training. This challenge class does not exist in model-based systems, which use deterministic pose estimation from known geometry.

Huazhong University of Science and Technology’s two-stage grasp planning system separately trains a grasp pose prediction network and a grasp pose evaluation network — both using a reuse structure architecture — achieving robust grasping of unknown objects in multi-object stacked scenes. The decoupling of prediction and evaluation enables the model to first generate candidate poses and then score them, improving both coverage and precision.

The University of California’s approach to training data generation provides a principled framework: using 3D object model collections, analytical representations of grasp force and torque mechanics, and statistical sampling to model uncertainty in sensing and control, synthetic training datasets (sensor images paired with labelled grasp configurations) are generated to train function approximators. The explicit modelling of uncertainty — covering initial state, contact, friction, inertia, object shape, and sensor data — is a training-data design choice uniquely relevant to learning-based systems, and aligns with uncertainty quantification principles discussed by NIST for safety-critical robotic applications.

Figure 2 — Learning-based grasp planning pipeline for unstructured bin picking
Learning-based grasp planning pipeline for unstructured bin picking — sim-to-real workflow Sim Grasp Data Gen. Domain Randomisation Neural Net Training Inference / Pose Predict. Real-World Deployment
Learning-based pipelines invest heavily in simulation data generation and domain randomisation before training; the deployed model requires only sensor data — no per-object model — at inference time.

Head-to-head: six dimensions that separate the paradigms

The two paradigms diverge across six operationally significant dimensions — each of which has direct implications for R&D investment, IP strategy, and deployment risk. The table below consolidates the key distinctions drawn from the patent dataset.

Dimension Model-Based Learning-Based
Prior knowledge required Per-object CAD model or surface mesh mandatory Diverse synthetic training dataset; no per-object model at inference
Grasp quality scoring Analytical ε-metrics or convex-hull volume from 6D wrench contact data Network output score; ensemble fusion (Bosch); MCDM arbitration (Siemens)
Unseen object handling Degrades; requires operator intervention or geometry primitives Generalises if training distribution is sufficiently diverse
Sim-to-real gap Not applicable — uses deterministic pose estimation from known geometry Key challenge; mitigated by domain randomisation (Northeastern University)
Computational profile Deterministic; graph search or pre-computed grasp lookup GPU-accelerated inference; latency variability; benefits from modular decomposition
Replanning adaptability Static; replanning requires re-running pose estimation Dynamic; Honda Motor’s iterative re-planning generates new candidate trajectories at each time step

Sensor modality convergence: point clouds as common ground

Despite their architectural differences, both paradigms are converging on point cloud processing as the primary sensor modality. Model-based systems use point clouds for pose estimation matching — registering a known model against measured scene geometry. Learning-based systems use point clouds as direct network input, as demonstrated by Henan University’s PointGrasp-Net and TATA Consultancy Services’ GQS framework. This convergence has important implications for sensor hardware selection and data pipeline design in industrial deployments.

FANUC’s modular neural network approach for 6-DOF bin picking grasp learning decomposes the prediction problem into two separate networks — one encoding grasp position dimensions and one encoding rotation dimensions — reducing search complexity from a product to a sum of evaluated positions and rotations, explicitly addressing the latency and accuracy limitations of single-network approaches.

Ensemble uncertainty and multi-criteria decision making

Siemens Corporation’s high-level sensor fusion architecture combines multiple AI module outputs through a multi-criteria decision making (MCDM) module to rank grasping alternatives — representing a system-level architecture that sits above either single paradigm. Ambi Robotics’ 2025 filing describes a grasp quality convolutional neural network that scores candidate grasp plans directly from image data, enabling real-time evaluation without analytic model access. These architectures reflect a broader trend: as learning-based components mature, the integration layer — how competing grasp proposals are arbitrated — becomes a distinct IP battleground.

Patent landscape: who is innovating and where

The patent dataset reveals distinct innovation clusters by assignee type, with industrial leaders and academic institutions pursuing markedly different strategies across the model-based and learning-based spectrum.

Figure 3 — Patent filing activity by assignee type: model-based vs. learning-based grasp planning
Robotic grasp planning patent focus — model-based vs. learning-based by assignee 0 25 50 75 % Focus (indicative) 60% 80% FANUC 50% 60% ABB 20% 75% Bosch 30% 70% Siemens 10% 90% CN Academia Model-Based Focus Learning-Based Focus
Indicative patent focus distribution based on the 50+ patent dataset. FANUC pursues both paradigms with high intensity; Chinese academic assignees are overwhelmingly concentrated in learning-based approaches. Values are directional indicators derived from patent content analysis, not precise counts.

FANUC Corporation dominates the industrial model/learning hybrid space, with multiple patent families covering simulation-based training data generation for grasp learning, modular neural networks for 6-DOF bin picking, adaptive model-based grasp planning with intermediate pose search, human demonstration-guided grasp teaching, and automated gripper fingertip design.

TATA Consultancy Services Limited has filed a globally coordinated patent family (EP, IN, US, AU, JP) covering their point cloud-based GQS framework — a rare example of a service company pursuing strong IP in the grasp planning space.

Robert Bosch GmbH focuses on learning-based control model training with ensemble and mixture model approaches, and descriptor-based pose estimation for unknown pose situations. ABB Schweiz AG pursues both CAD-model-based 6D pose pipelines and hybrid sim-real training — a dual strategy covering both paradigms. Siemens Corporation focuses on high-level sensor fusion and AI module orchestration for bin picking decision-making, with the MCDM-based architecture as their distinctive contribution.

Chinese academic assignees — including Shanghai Jiao Tong University, Zhejiang University, Henan University, Huazhong University of Science and Technology, and Northeastern University — collectively represent the largest volume of learning-based grasp innovation, covering sparse CNNs, reinforcement learning policies, sim-to-real transfer, two-stage prediction-evaluation pipelines, and multi-modal perception integration. This pattern reflects a broader trend documented by OECD in AI-related patent filings, where Chinese academic institutions have become a dominant source of applied machine learning IP.

Key finding: TATA Consultancy Services as an IP outlier

TATA Consultancy Services has filed a globally coordinated patent family (EP, IN, US, AU, JP) covering their point cloud-based GQS framework for bin picking grasp planning — representing a rare example of a technology services company pursuing strong, multi-jurisdictional IP in a hardware-adjacent robotics domain typically dominated by OEMs and academic institutions.

Why the hybrid paradigm is winning in industrial deployment

The dominant industrial paradigm for unstructured bin picking grasp planning is hybrid: simulation-based grasp data generation feeds neural network training, combining model-based data quality with learning-based deployment flexibility. This convergence is not incidental — it reflects the practical limitations of each pure approach when confronted with real factory conditions.

ABB Schweiz AG’s hybrid training approach for object picking robots combines simulated grasp quality evaluation with actual robot experiment validation — using both simulated and real grasp performance data to train models that perform reliably in physical deployment. This approach represents the dominant industrial paradigm for unstructured bin picking, combining model-based data quality with learning-based deployment flexibility.

ABB’s hybrid training approach represents the most direct synthesis: grasp locations are assigned from object physical properties, simulated grasp quality is evaluated for each assigned location, candidate grasp locations are determined from simulation data, and then actual robot experiments validate real grasp quality — combining the interpretability of model-based evaluation with the scalability of learning. This mirrors the methodology advocated in robotics benchmarking standards discussed by bodies such as ISO for validating autonomous manipulation systems.

Honda Motor Co.’s online iterative re-planning system demonstrates that learning-based systems can support dynamic replanning — generating new sets of candidate object trajectories at each time step and calculating contact points for associated grasps — offering adaptability that static model-based planners cannot easily provide. This capability is particularly valuable in unstructured bin picking where object positions shift during the picking sequence.

“Prior single-network 6-DOF learning approaches were not fast enough due to time-consuming candidate grasp calculation requirements or not accurate enough because they attempt to predict too many dimensions.” — FANUC Corporation, motivating the modular two-network decomposition.

The sim-to-real gap remains the most significant unresolved challenge for purely learning-based systems. Northeastern University’s visual domain randomisation approach — varying camera pose, image brightness, saturation, contrast, and Gaussian noise during training — represents the current state of the art for bridging this gap at the perceptual level. But contact dynamics, friction coefficients, and gripper compliance remain harder to randomise faithfully, which is why real-robot validation data — as used in ABB’s hybrid approach — continues to be valued even when simulation infrastructure is available.

Track how FANUC, ABB, Bosch, and academic assignees are evolving their hybrid grasp planning strategies — search the full patent database in PatSnap Eureka.

Analyse Bin Picking Patents in PatSnap Eureka →

Trinamix GmbH’s self-learning grasp sequence approach (2023) adds a further dimension: accumulating grasp success and failure history at the system level to continuously refine grasp strategy selection. This online learning loop — operating on top of an initial trained model — represents a third architectural layer beyond the pure model-based and learning-based dichotomy, and is likely to become more prevalent as deployed systems accumulate operational data at scale.

For R&D engineers and IP professionals evaluating grasp planning architectures, the practical implication is clear: neither paradigm is sufficient alone for robust, flexible unstructured bin picking. The investment question is not model-based versus learning-based, but rather how much simulation infrastructure, real-robot validation data, and ongoing model maintenance a deployment context can support — and which assignees have already secured IP positions across the hybrid design space that matters most for the target application.

Frequently asked questions

Grasp planning for bin picking — key questions answered

Still have questions? Let PatSnap Eureka answer them for you.

Ask PatSnap Eureka for a Deeper Answer →

References

  1. Method and System for Point Cloud Based Grasp Planning Framework — TATA Consultancy Services Limited, 2024
  2. Method and System for Point Cloud Based Grasp Planning Framework — TATA Consultancy Services Limited, 2025
  3. Efficient Data Generation for Grasp Learning with General Gripper — FANUC Corporation, 2022
  4. Efficient Data Generation for Grasp Learning by General Grippers — FANUC Corporation, 2025
  5. Adaptive Grasp Planning for Bin Picking — FANUC Corporation, 2021
  6. Adaptive Grasp Planning for Bin Picking — FANUC Corporation, 2025
  7. Automatic Gripper Fingertip Design to Reduce Leftover in Random Bin Picking Applications — FANUC Corporation, 2024
  8. Grasp Learning Using Modular Neural Networks — FANUC Corporation, 2022
  9. Network Modularization for Learning High-Dimensional Robot Tasks — FANUC Corporation, 2022
  10. System and Method for Perception Based Adaptive Motion Planning for Picking — ABB Schweiz AG, 2025
  11. Hybrid Machine Learning-Based Systems and Methods for Training an Object Picking Robot with Real and Simulated Performance Data — ABB Schweiz AG, 2020
  12. High-Level Sensor Fusion and Multi-Criteria Decision Making for Autonomous Bin Picking — Siemens Corporation, 2024
  13. Robot Device, Computer-Implemented Method for Training Robot Control Model, and Method for Controlling Robot Device — Robert Bosch GmbH, 2023
  14. Device and Method for Controlling Robot for Picking Up Object in Various Pose Situations — Robert Bosch GmbH, 2022
  15. Systems and Methods for Online Iterative Re-Planning — Honda Motor Co., Ltd., 2024
  16. Disordered Grasping Method for Robot Arm Based on Sparse Convolutional Neural Network — Henan University, 2024
  17. Robotic Arm Visual Grasping Method Based on Sim2Real Transfer Under Space-Constrained Conditions — Northeastern University, 2025
  18. Two-Stage Robotic Arm Grasp Planning Method and System Based on Reuse Structure — Huazhong University of Science and Technology, 2022
  19. Robot System and Method for Robust Grasping and Targeting of Objects — University of California Board of Regents, 2020
  20. Depth Projection-Based Robot Gripper Grasp Planning Method and Control Device — Hangzhou Jiazhhi Technology Co., Ltd., 2017
  21. Image Processing Device, Image Processing Method and Computer Program — Keyence Corporation, 2018
  22. Robot Simulation Device and Robot Simulation Method — Keyence Corporation, 2019
  23. Coordinating Multiple Robots to Meet Workflow and Avoid Conflict — Dexterity, Inc., 2024
  24. Robot Packaging Processing System and Method — Ambi Robotics, 2025
  25. Self Learning Grasp Sequence for Robot Bin Picking — Trinamix GmbH, 2023
  26. WIPO — World Intellectual Property Organization (patent filing trend data)
  27. IEEE — Institute of Electrical and Electronics Engineers (robotic learning systems research)
  28. OECD — AI-related patent filing trends and academic IP analysis
  29. NIST — National Institute of Standards and Technology (uncertainty quantification in robotic systems)
  30. ISO — International Organization for Standardization (autonomous manipulation benchmarking standards)
  31. PatSnap IP Intelligence Platform — innovation analytics for R&D and IP teams
  32. PatSnap Insights Blog — latest research in robotics, AI, and IP strategy

All data and statistics in this article are sourced from the references above and from PatSnap‘s proprietary innovation intelligence platform.

Your Agentic AI Partner
for Smarter Innovation

PatSnap fuses the world’s largest proprietary innovation dataset with cutting-edge AI to
supercharge R&D, IP strategy, materials science, and drug discovery.

Book a demo