Model-Based Grasp Planning: Geometric Reasoning and Analytical Metrics
Model-based grasp planning works because a robot already knows what it is looking at. The system holds a 3D object model — typically a CAD file or surface mesh — matches that model against sensor data to estimate a 6D object pose, and then computes stable grasp configurations analytically. The core assumption is that a high-fidelity model of the target workpiece exists before any picking begins.
FANUC’s adaptive grasp planning patents provide a canonical example. The system analyzes workpiece shape to identify multiple robust grasp options with specified positions and orientations, then evaluates each individual workpiece in the bin to identify the feasible grasp set. When a direct motion to the goal pose is impossible, the system formulates an explicit search problem over stable intermediate poses, evaluating each link between nodes for feasibility based on collision-avoidance constraints and robot joint constraints. This deterministic, graph-search-based strategy is tightly coupled to a known part model; it cannot operate on unseen object categories.
ABB’s perception-based adaptive motion planning system represents a more recent model-based architecture. As detailed in ABB Schweiz AG’s 2025 filing, 6D pose determination is performed using a pose determination model fed by multi-image capture of the bin. A CAD model for each object is then used to generate a 3D rendering at the estimated pose, and a 2D picking-direction mask is generated to plan the motion. The reliance on CAD models per object class is characteristic of the model-based approach — the system’s applicability is bounded by the catalog of available models.
Traditional model-based systems compute grasp quality using ε-metrics or convex-hull volume metrics derived from 6D wrench contact information in simulation environments such as GraspIt or OpenRAVE. These metrics have known physical interpretability but require complete contact geometry from a model — they cannot be applied to objects without a registered shape representation.
Keyence Corporation’s image processing system illustrates how statistical history accumulates within a model-based architecture: the system registers multiple work models, performs 3D search to identify position and posture of the work in a bulk pile, and uses statistical data about grip success or failure at pre-registered grip positions to select optimal grasps. The underlying grasp candidates are always computed relative to a known model — the statistical layer refines selection, it does not replace the model requirement.
TATA Consultancy Services’ point cloud-based grasp planning framework attempts to bridge this gap by operating geometrically on raw point cloud data without object recognition. Their system generates grasp poses in a random configuration, computes depth difference values per pixel for each sampled grasp pose, generates binary maps to obtain feasible subregions, and refines poses using a Grasp Quality Score (GQS). While geometry-driven rather than model-match-driven, this approach still relies on explicit analytical criteria rather than learned representations — placing it at the boundary of the model-based paradigm, as noted in standards discussions from ISO on robotic perception requirements.
Model-based grasp planning systems require a 3D CAD model or surface mesh for each target object class to compute analytically valid grasp poses; they cannot operate on unseen object categories without a registered shape representation.
Learning-Based Grasp Planning: Neural Networks, Simulation Data, and Generalization
Learning-based grasp planning replaces explicit model matching with trained neural networks that map raw sensor data — typically depth images, point clouds, or RGB-D data — directly to predicted grasp poses and associated success probabilities. The central advantage is generalization: a well-trained model can propose grasps for objects it has never explicitly been programmed to handle, a capability that model-based systems structurally cannot offer.
FANUC’s grasp learning pipeline illustrates the canonical sim-to-real approach. The process begins with a database of solid or surface models for all objects and grippers, performs iterative optimization to compute hundreds of grasps per part using surface contact geometry, maps those grasps into simulated bin pile scenarios, and then correlates simulation results with camera depth image data to train neural networks for real-world grasp execution. The simulation-generated grasp points and approach directions become the supervised training signal — the learned model then generalizes across geometries it was not explicitly optimized for during inference.
“Single-network approaches for 6-DOF grasping suffer from high search complexity, require post-hoc grasp refinement, and struggle with cluttered environments typical of bin picking.”
Neural network architecture choices significantly affect performance. FANUC’s modular learning approach addresses the high-dimensional action space challenge by decomposing the 6-DOF grasp prediction problem into a first network encoding grasp position dimensions and a second network encoding rotation dimensions. This modular decomposition reduces each network’s search space to a sum rather than a product of evaluated positions and rotations — a direct architectural response to the combinatorial explosion that plagues single-network 6-DOF learning.
Explore the full patent landscape for robotic bin picking grasp planning in PatSnap Eureka.
Search Grasp Planning Patents in PatSnap Eureka →Henan University’s sparse convolutional neural network approach demonstrates how learning-based methods can exploit point cloud geometry — incorporating surface curvature and normal information into the PointGrasp-Net architecture — and process only valid 3D scene points rather than the entire point cloud space. The model is trained in PyBullet simulation using randomly generated grasp poses on surface point clouds of randomly placed objects, explicitly designed for non-structured, random unordered grasping scenes without dependence on object shape or structure. This is a direct contrast to model-based systems, as documented in IEEE robotics literature on model-free manipulation.
Robert Bosch GmbH’s ensemble prediction approach trains multiple prediction models and fuses their outputs via a mixture model to generate a combined pick prediction. The system explicitly acknowledges that each individual model’s quality depends on input-training data similarity, applicability to different object categories, and sensitivity to image noise — and that combining model outputs mitigates these individual failure modes. This is a learning-specific challenge with no direct analogue in model-based systems.
FANUC’s modular neural network for bin picking grasp planning decomposes 6-DOF grasp prediction into two separate networks — one encoding grasp position dimensions and one encoding rotation dimensions — reducing each network’s search space to a sum rather than a product of evaluated positions and rotations.
Huazhong University of Science and Technology’s two-stage grasp planning system separately trains a grasp pose prediction network and a grasp pose evaluation network — both using a reuse structure architecture — achieving robust grasping of unknown objects in multi-object stacked scenes. The decoupling of prediction and evaluation enables the model to first generate candidate poses and then score them, improving both coverage and precision. The University of California’s training data generation framework adds a principled uncertainty layer: using 3D object model collections, analytical representations of grasp force and torque mechanics, and statistical sampling to model uncertainty in sensing and control, synthetic training datasets are generated to train function approximators — explicitly covering uncertainty in initial state, contact, friction, inertia, object shape, and sensor data.
Head-to-Head: Four Dimensions That Define the Divide
The most operationally significant differences between model-based and learning-based grasp planning emerge across four dimensions: prior knowledge requirements, grasp quality computation, handling of unseen objects, and computational profile. Each dimension reveals a genuine trade-off rather than a clear winner.
Prior Knowledge Requirements
Model-based systems fundamentally require a 3D object model for each target object class. This is explicit in ABB’s CAD-driven picking pipeline and FANUC’s adaptive grasp planning, both of which use part geometry as the entry point for all downstream grasp computation. FANUC’s 2025 adaptive grasp planning patent directly states that the workpiece shape is analyzed to identify robust grasp options — implying that a shape model must exist before any planning can occur.
Learning-based systems can be trained to predict grasps for object categories unseen at deployment time, provided the training distribution was sufficiently diverse. TATA Consultancy Services explicitly frames this as the core motivation: “a fully automated and reliable picking of a diverse range of unseen objects in clutter is a challenging problem.” Pure learning-based systems like FANUC’s neural-network bin picking pipeline do require model data for training, but deploy using only sensor data — enabling generalization beyond the training object set, consistent with research published by Nature on neural generalization in robotic manipulation.
The dominant industrial paradigm for robotic bin picking grasp planning is hybrid: simulation-based grasp data generation feeds neural network training, combining model-based data quality with learning-based deployment flexibility, as demonstrated by FANUC’s efficient data generation pipeline which computes hundreds of grasps per part in simulation before training neural networks for real-world execution.
Grasp Quality Computation and Scoring
Model-based systems compute grasp quality analytically using ε-metrics or convex-hull volume metrics derived from 6D wrench contact information in simulation environments such as GraspIt or OpenRAVE. These metrics have known physical interpretability but require complete contact geometry from a model. Learning-based systems predict grasp quality as a network output score — Ambi Robotics’ grasp quality convolutional neural network scores candidate grasp plans directly from image data, enabling real-time evaluation without analytic model access. Siemens’ high-level sensor fusion architecture combines multiple AI module outputs through a multi-criteria decision making (MCDM) module to rank grasping alternatives — a hybrid scoring approach that fuses learned predictions with structured decision logic.
Handling of Unseen Objects and the Sim-to-Real Gap
Model-based systems degrade when object models are unavailable — they may fall back to geometry primitives or require operator intervention. Learning-based systems trained on diverse synthetic data can generalize to novel geometries, but their generalization quality depends critically on domain coverage in the training data.
The sim-to-real gap is a challenge specific to learning-based approaches. Northeastern University’s 2025 patent addresses this through visual domain randomization — varying camera pose, image brightness, saturation, contrast, and Gaussian noise during training — so that the deployed model handles real-world perceptual variation not present during simulation training. This entire challenge class does not exist in model-based systems, which use deterministic pose estimation from known geometry. ABB’s hybrid training approach represents the most direct synthesis: grasp locations are assigned from object physical properties, simulated grasp quality is evaluated for each assigned location, and then actual robot experiments validate real grasp quality — combining the interpretability of model-based evaluation with the scalability of learning.
The sim-to-real gap — the performance drop when a neural network trained in simulation is deployed on a physical robot — is a challenge exclusive to learning-based systems. Northeastern University’s domain randomization approach varies camera pose, image brightness, saturation, contrast, and Gaussian noise during training to mitigate this gap. Model-based systems, using deterministic pose estimation from known geometry, do not face this challenge class.
Computational Profile and Deployment Constraints
Model-based systems require real-time pose estimation — computationally intensive but architecturally predictable. Once poses are computed, grasp selection is often a graph search or lookup in a pre-computed grasp set. Learning-based systems require neural network inference at deployment, which benefits from GPU acceleration but introduces latency variability. FANUC’s modular network decomposition directly addresses this: prior single-network 6-DOF learning approaches were “not fast enough due to time-consuming candidate grasp calculation requirements or not accurate enough because they attempt to predict too many dimensions.” Honda Motor Co.’s online iterative re-planning system demonstrates that learning-based systems can support dynamic replanning — generating new sets of candidate object trajectories at each time step and calculating contact points for associated grasps — offering adaptability that static model-based planners cannot easily provide, a capability increasingly relevant according to WIPO‘s technology trend reports on industrial robotics.
Map the competitive patent landscape for bin picking across FANUC, ABB, Siemens, and Chinese universities with PatSnap Eureka.
Analyse Bin Picking IP in PatSnap Eureka →Patent Landscape: Who Is Filing and Where Innovation Clusters
The patent data reveals distinct innovation clusters by assignee type, with different organisations staking out different segments of the grasp planning stack. Understanding these clusters is as important for IP strategy as the technical distinctions themselves.
FANUC Corporation dominates the industrial model/learning hybrid space, with multiple patent families covering simulation-based training data generation for grasp learning, modular neural networks for 6-DOF bin picking, adaptive model-based grasp planning with intermediate pose search, human demonstration-guided grasp teaching, and automated gripper fingertip design. FANUC’s breadth across both paradigms makes it the most comprehensive single assignee in the dataset.
TATA Consultancy Services Limited has filed a globally coordinated patent family across EP, IN, US, AU, and JP jurisdictions covering their point cloud-based GQS framework — a rare example of a service company pursuing strong IP in the grasp planning space. Robert Bosch GmbH focuses on learning-based control model training with ensemble and mixture model approaches and descriptor-based pose estimation for unknown pose situations. ABB Schweiz AG pursues both CAD-model-based 6D pose pipelines and hybrid sim-real training, representing a dual strategy covering both paradigms simultaneously.
Siemens Corporation focuses on high-level sensor fusion and AI module orchestration for bin picking decision-making, with the MCDM-based architecture as their distinctive contribution — a system-level architecture that sits above either single paradigm and arbitrates between competing learning-based grasp proposals.
Chinese academic assignees — including Shanghai Jiao Tong University, Zhejiang University, Henan University, Huazhong University of Science and Technology, and Northeastern University — collectively represent the largest volume of learning-based grasp innovation across the dataset, covering sparse CNNs, reinforcement learning policies, sim-to-real transfer, two-stage prediction-evaluation pipelines, and multi-modal perception integration. Dexterity Inc. focuses on robotic singulation — picking individual items from cluttered workspaces and placing them singly — with grasp strategy probability computation as a core component.
Chinese academic assignees including Henan University, Huazhong University of Science and Technology, Northeastern University, Shanghai Jiao Tong University, and Zhejiang University collectively represent the largest volume of learning-based grasp planning innovation in the bin picking patent landscape, covering sparse CNNs, reinforcement learning policies, sim-to-real transfer, and two-stage prediction-evaluation pipelines.
Implications for R&D and IP Strategy in Robotic Bin Picking
The patent landscape described above has direct implications for teams selecting or designing robotic picking systems — and for IP professionals mapping freedom-to-operate or building patent portfolios in this space. Several structural conclusions follow from the evidence.
The hybrid paradigm is now the industrial standard. Simulation-based grasp data generation feeding neural network training is the approach adopted by FANUC, ABB, and multiple academic groups. Teams building new systems should expect to invest in simulation infrastructure — not as an alternative to learning, but as a prerequisite for it. This is consistent with guidance from OECD on AI adoption in advanced manufacturing, which identifies simulation-to-deployment pipelines as a key capability bottleneck.
Point cloud processing is converging as the primary sensor modality for both paradigms. Model-based systems use it for pose estimation matching; learning-based systems use it as direct network input. Any new system architecture should be designed around point cloud ingestion as a first-class capability, with depth image processing as a secondary modality.
6-DOF grasp learning requires explicit architectural mitigation. The decomposition of position and rotation into separate networks — as FANUC demonstrates — is not merely an optimisation choice but a structural requirement for real-time bin picking performance. Single-network approaches are documented as insufficiently fast or accurate for cluttered environments.
Ensemble and fusion approaches reduce single-model failure risk. Robert Bosch’s mixture model approach and Siemens’ MCDM architecture both address the same underlying problem: individual learned models fail in predictable ways when the input distribution shifts. Multi-model fusion is a robust engineering pattern for production deployment, particularly when object variety is high.
IP white space exists in sim-to-real transfer for constrained environments. Northeastern University’s domain randomization approach specifically targets space-constrained conditions — a narrower problem than general bin picking. R&D teams working on confined workspace applications (automotive assembly, pharmaceutical packaging) may find meaningful freedom-to-operate in this sub-domain. Teams can explore this landscape directly using PatSnap’s innovation intelligence platform to identify filing gaps and technology adjacencies.
In robotic bin picking grasp planning, point cloud processing is converging as the primary sensor modality for both model-based and learning-based systems: model-based systems use point clouds for pose estimation matching against registered CAD models, while learning-based systems use point clouds as direct neural network input, as demonstrated by Henan University’s PointGrasp-Net and TATA Consultancy Services’ GQS framework.
Real-robot validation remains the gold standard for bridging paradigms. ABB’s hybrid training approach uses both simulated and real grasp performance data to train models that perform reliably in physical deployment. Teams that rely exclusively on simulation-validated performance metrics should expect a meaningful performance gap at deployment — one that only real-robot data can close. The PatSnap Insights blog covers additional case studies on sim-to-real validation in industrial robotics.