How model-based grasp planning works — and where it hits its limits
Model-based grasp planning for bin picking is grounded in the availability of known object geometry — typically a CAD model or surface mesh — which is matched against sensor data to estimate a 6D object pose, after which stable grasp configurations are computed analytically. The core assumption is that a high-fidelity model of the target workpiece exists and that the robot’s task is to estimate where that object is, then apply pre-computed grasp strategies to it.
FANUC’s adaptive grasp planning patents provide a canonical example. The system analyses workpiece shape to identify multiple robust grasp options with specified positions and orientations, then evaluates each individual workpiece in the bin to identify the feasible grasp set. When a direct motion to the goal pose is impossible, the system formulates an explicit search problem over stable intermediate poses, evaluating each link between nodes for feasibility based on collision-avoidance constraints and robot joint constraints. This deterministic, graph-search-based strategy is tightly coupled to a known part model; it cannot operate on unseen object categories.
ABB’s perception-based adaptive motion planning system represents a more recent model-based architecture. As detailed in ABB Schweiz AG’s 2025 filing, 6D pose determination is performed using a pose determination model fed by multi-image capture of the bin. A CAD model for each object is then used to generate a 3D rendering at the estimated pose, and a 2D picking-direction mask is generated to plan the motion. The reliance on CAD models per object class is characteristic of the model-based approach — the system’s applicability is bounded by the catalogue of available models.
TATA Consultancy Services’ point cloud-based framework computes a Grasp Quality Score (GQS) by generating grasp poses in a random configuration, computing depth difference values per pixel for each sampled pose, and generating binary maps to obtain feasible subregions. The GQS is an explicit analytical criterion — not a learned representation — sitting at the boundary between model-based and learning-based approaches.
Grasp quality in model-based systems is computed analytically. The traditional approach — described in the background of Hangzhou Jiazhhi Technology’s 2017 filing — uses ε-metrics or convex-hull volume metrics derived from 6D wrench contact information computed in simulation environments like GraspIt or OpenRAVE. These metrics have known physical interpretability but require complete contact geometry from a model.
A fundamental limitation of pure model-based approaches surfaces in truly unstructured environments: they typically cannot handle novel or unseen objects. According to WIPO filing trends, the volume of patents addressing this limitation has grown substantially as manufacturers move toward flexible, mixed-SKU picking lines — a deployment scenario where pre-registered object models are simply not available for every item in the bin.
Model-based grasp planning for unstructured bin picking requires a 3D CAD model or surface mesh for each target object class. Grasp candidates are computed relative to this known model, making the approach deterministic and physically interpretable but incapable of handling novel or unseen objects without operator intervention.
Learning-based grasp planning: neural networks, synthetic data, and the sim-to-real challenge
Learning-based grasp planning replaces explicit model matching and analytical grasp synthesis with trained neural networks that map raw sensor data — typically depth images, point clouds, or RGB-D data — directly to predicted grasp poses and associated success probabilities. The central advantage is generalisation: a well-trained model can propose grasps for objects it has never explicitly been programmed to handle.
FANUC’s grasp learning pipeline illustrates the canonical sim-to-real learning pipeline. The process begins with a database of solid or surface models for all objects and grippers, performs iterative optimisation to compute hundreds of grasps per part using surface contact geometry, maps those grasps into simulated bin pile scenarios, and then correlates simulation results with camera depth image data to train neural networks for real-world grasp execution. Critically, the simulation-generated grasp points and approach directions become the supervised training signal for the network — the learned model then generalises across geometries it was not explicitly optimised for during inference.
“A fully automated and reliable picking of a diverse range of unseen objects in clutter is a challenging problem” — TATA Consultancy Services, framing the core motivation for geometry-driven, model-free grasp frameworks.
Neural network architecture choices significantly affect learning-based system performance. FANUC’s modular learning approach addresses the high-dimensional action space challenge by decomposing the 6-DOF grasp prediction problem into a first network encoding grasp position dimensions and a second network encoding rotation dimensions. The authors explicitly identify that single-network approaches for 6-DOF grasping suffer from high search complexity, require post-hoc grasp refinement, and struggle with cluttered environments — motivating the modular decomposition that reduces each network’s search space to a sum rather than a product of evaluated positions and rotations.
Explore the full patent landscape for robotic grasp planning in PatSnap Eureka — search, filter, and analyse filings from FANUC, ABB, Bosch, and 50+ assignees.
Explore Grasp Planning Patents in PatSnap Eureka →Henan University’s sparse convolutional neural network approach demonstrates how learning-based methods can exploit point cloud geometry — incorporating surface curvature and normal information into the PointGrasp-Net architecture — and process only valid 3D scene points rather than the entire point cloud space. The model is trained in PyBullet simulation using randomly generated grasp poses on surface point clouds of randomly placed objects, explicitly designed for non-structured, random unordered grasping scenes without dependence on object shape or structure.
Robert Bosch GmbH’s ensemble prediction approach trains multiple prediction models and fuses their outputs via a mixture model to generate a combined pick prediction. The system explicitly acknowledges that each individual model’s quality depends on input-training data similarity, applicability to different object categories, and sensitivity to image noise — and that combining model outputs mitigates these individual failure modes. This is a learning-specific challenge with no direct analogue in model-based systems, and reflects the kind of uncertainty management that organisations like IEEE have highlighted as a core open problem in deployed robotic learning systems.
Northeastern University’s sim-to-real transfer method for robotic arm visual grasping applies visual domain randomisation — varying camera pose, image brightness, saturation, contrast, and Gaussian noise during training — so that the deployed model handles real-world perceptual variation not present during simulation training. This challenge class does not exist in model-based systems, which use deterministic pose estimation from known geometry.
Huazhong University of Science and Technology’s two-stage grasp planning system separately trains a grasp pose prediction network and a grasp pose evaluation network — both using a reuse structure architecture — achieving robust grasping of unknown objects in multi-object stacked scenes. The decoupling of prediction and evaluation enables the model to first generate candidate poses and then score them, improving both coverage and precision.
The University of California’s approach to training data generation provides a principled framework: using 3D object model collections, analytical representations of grasp force and torque mechanics, and statistical sampling to model uncertainty in sensing and control, synthetic training datasets (sensor images paired with labelled grasp configurations) are generated to train function approximators. The explicit modelling of uncertainty — covering initial state, contact, friction, inertia, object shape, and sensor data — is a training-data design choice uniquely relevant to learning-based systems, and aligns with uncertainty quantification principles discussed by NIST for safety-critical robotic applications.
Head-to-head: six dimensions that separate the paradigms
The two paradigms diverge across six operationally significant dimensions — each of which has direct implications for R&D investment, IP strategy, and deployment risk. The table below consolidates the key distinctions drawn from the patent dataset.
| Dimension | Model-Based | Learning-Based |
|---|---|---|
| Prior knowledge required | Per-object CAD model or surface mesh mandatory | Diverse synthetic training dataset; no per-object model at inference |
| Grasp quality scoring | Analytical ε-metrics or convex-hull volume from 6D wrench contact data | Network output score; ensemble fusion (Bosch); MCDM arbitration (Siemens) |
| Unseen object handling | Degrades; requires operator intervention or geometry primitives | Generalises if training distribution is sufficiently diverse |
| Sim-to-real gap | Not applicable — uses deterministic pose estimation from known geometry | Key challenge; mitigated by domain randomisation (Northeastern University) |
| Computational profile | Deterministic; graph search or pre-computed grasp lookup | GPU-accelerated inference; latency variability; benefits from modular decomposition |
| Replanning adaptability | Static; replanning requires re-running pose estimation | Dynamic; Honda Motor’s iterative re-planning generates new candidate trajectories at each time step |
Sensor modality convergence: point clouds as common ground
Despite their architectural differences, both paradigms are converging on point cloud processing as the primary sensor modality. Model-based systems use point clouds for pose estimation matching — registering a known model against measured scene geometry. Learning-based systems use point clouds as direct network input, as demonstrated by Henan University’s PointGrasp-Net and TATA Consultancy Services’ GQS framework. This convergence has important implications for sensor hardware selection and data pipeline design in industrial deployments.
FANUC’s modular neural network approach for 6-DOF bin picking grasp learning decomposes the prediction problem into two separate networks — one encoding grasp position dimensions and one encoding rotation dimensions — reducing search complexity from a product to a sum of evaluated positions and rotations, explicitly addressing the latency and accuracy limitations of single-network approaches.
Ensemble uncertainty and multi-criteria decision making
Siemens Corporation’s high-level sensor fusion architecture combines multiple AI module outputs through a multi-criteria decision making (MCDM) module to rank grasping alternatives — representing a system-level architecture that sits above either single paradigm. Ambi Robotics’ 2025 filing describes a grasp quality convolutional neural network that scores candidate grasp plans directly from image data, enabling real-time evaluation without analytic model access. These architectures reflect a broader trend: as learning-based components mature, the integration layer — how competing grasp proposals are arbitrated — becomes a distinct IP battleground.
Patent landscape: who is innovating and where
The patent dataset reveals distinct innovation clusters by assignee type, with industrial leaders and academic institutions pursuing markedly different strategies across the model-based and learning-based spectrum.
FANUC Corporation dominates the industrial model/learning hybrid space, with multiple patent families covering simulation-based training data generation for grasp learning, modular neural networks for 6-DOF bin picking, adaptive model-based grasp planning with intermediate pose search, human demonstration-guided grasp teaching, and automated gripper fingertip design.
TATA Consultancy Services Limited has filed a globally coordinated patent family (EP, IN, US, AU, JP) covering their point cloud-based GQS framework — a rare example of a service company pursuing strong IP in the grasp planning space.
Robert Bosch GmbH focuses on learning-based control model training with ensemble and mixture model approaches, and descriptor-based pose estimation for unknown pose situations. ABB Schweiz AG pursues both CAD-model-based 6D pose pipelines and hybrid sim-real training — a dual strategy covering both paradigms. Siemens Corporation focuses on high-level sensor fusion and AI module orchestration for bin picking decision-making, with the MCDM-based architecture as their distinctive contribution.
Chinese academic assignees — including Shanghai Jiao Tong University, Zhejiang University, Henan University, Huazhong University of Science and Technology, and Northeastern University — collectively represent the largest volume of learning-based grasp innovation, covering sparse CNNs, reinforcement learning policies, sim-to-real transfer, two-stage prediction-evaluation pipelines, and multi-modal perception integration. This pattern reflects a broader trend documented by OECD in AI-related patent filings, where Chinese academic institutions have become a dominant source of applied machine learning IP.
TATA Consultancy Services has filed a globally coordinated patent family (EP, IN, US, AU, JP) covering their point cloud-based GQS framework for bin picking grasp planning — representing a rare example of a technology services company pursuing strong, multi-jurisdictional IP in a hardware-adjacent robotics domain typically dominated by OEMs and academic institutions.
Why the hybrid paradigm is winning in industrial deployment
The dominant industrial paradigm for unstructured bin picking grasp planning is hybrid: simulation-based grasp data generation feeds neural network training, combining model-based data quality with learning-based deployment flexibility. This convergence is not incidental — it reflects the practical limitations of each pure approach when confronted with real factory conditions.
ABB Schweiz AG’s hybrid training approach for object picking robots combines simulated grasp quality evaluation with actual robot experiment validation — using both simulated and real grasp performance data to train models that perform reliably in physical deployment. This approach represents the dominant industrial paradigm for unstructured bin picking, combining model-based data quality with learning-based deployment flexibility.
ABB’s hybrid training approach represents the most direct synthesis: grasp locations are assigned from object physical properties, simulated grasp quality is evaluated for each assigned location, candidate grasp locations are determined from simulation data, and then actual robot experiments validate real grasp quality — combining the interpretability of model-based evaluation with the scalability of learning. This mirrors the methodology advocated in robotics benchmarking standards discussed by bodies such as ISO for validating autonomous manipulation systems.
Honda Motor Co.’s online iterative re-planning system demonstrates that learning-based systems can support dynamic replanning — generating new sets of candidate object trajectories at each time step and calculating contact points for associated grasps — offering adaptability that static model-based planners cannot easily provide. This capability is particularly valuable in unstructured bin picking where object positions shift during the picking sequence.
“Prior single-network 6-DOF learning approaches were not fast enough due to time-consuming candidate grasp calculation requirements or not accurate enough because they attempt to predict too many dimensions.” — FANUC Corporation, motivating the modular two-network decomposition.
The sim-to-real gap remains the most significant unresolved challenge for purely learning-based systems. Northeastern University’s visual domain randomisation approach — varying camera pose, image brightness, saturation, contrast, and Gaussian noise during training — represents the current state of the art for bridging this gap at the perceptual level. But contact dynamics, friction coefficients, and gripper compliance remain harder to randomise faithfully, which is why real-robot validation data — as used in ABB’s hybrid approach — continues to be valued even when simulation infrastructure is available.
Track how FANUC, ABB, Bosch, and academic assignees are evolving their hybrid grasp planning strategies — search the full patent database in PatSnap Eureka.
Analyse Bin Picking Patents in PatSnap Eureka →Trinamix GmbH’s self-learning grasp sequence approach (2023) adds a further dimension: accumulating grasp success and failure history at the system level to continuously refine grasp strategy selection. This online learning loop — operating on top of an initial trained model — represents a third architectural layer beyond the pure model-based and learning-based dichotomy, and is likely to become more prevalent as deployed systems accumulate operational data at scale.
For R&D engineers and IP professionals evaluating grasp planning architectures, the practical implication is clear: neither paradigm is sufficient alone for robust, flexible unstructured bin picking. The investment question is not model-based versus learning-based, but rather how much simulation infrastructure, real-robot validation data, and ongoing model maintenance a deployment context can support — and which assignees have already secured IP positions across the hybrid design space that matters most for the target application.