Book a demo

Multimodal Vision-Vibration Fusion Diagnosis 2026

Multimodal Vision-Vibration Fusion Diagnosis 2026
Explore in Eureka
Technology Landscape 2026

Multimodal Vision-Vibration Fusion Diagnosis

Combining optical imaging with vibration sensing through deep learning architectures is reshaping fault diagnosis, structural health monitoring, and autonomous perception. This dataset spans 2013–2026 across 8 named assignees and 6 jurisdictions.

8
named patent assignees in this dataset
Explore in Eureka
6
jurisdictions represented in retrieved records
Explore in Eureka
2013–2026
dataset coverage span
Explore in Eureka
99.85%
average accuracy reported for ViT bearing fault diagnosis in literature
Explore in Eureka
Published byPatSnap Insights Team··9 min readVerified by PatSnap Eureka Data
Technology Overview

Vision-Vibration Fusion: From Lab Concept to Deployable Systems

Multimodal vision-vibration fusion diagnosis systems jointly process optical or video data alongside vibration signals — acoustic, inertial, radar, or structural — using neural network fusion architectures. The core challenge is that single-modality sensing yields incomplete or ambiguous information about the health state of mechanical, structural, or biological systems.

Three principal sub-domains appear across the dataset: vibration-to-image transformation for deep classification, direct multi-sensor fusion with joint visual-vibration encoding, and video-based monitoring that extracts frequency and displacement information directly from camera frames without contact sensors.

Top Assignees by Patent Filing Count (Dataset Snapshot)
Top assignees by filing count in dataset: SRI International 3, Momenta 3, Toku Eyes 5+, GM Global Tech 2, Apple Inc 2Horizontal bar chart showing patent filing counts per named assignee in the retrieved dataset, spanning 2013–2026. Source: PatSnap Eureka retrieved records.Filing Count by Assignee (Dataset Snapshot)Toku Eyes Ltd5SRI International3Momenta (Suzhou)3GM Global Tech Ops2↗ Click bars to explore

The field spans roughly 2013–2026, with three discernible phases: a foundational phase (2013–2018) establishing pixel-intensity FFT and early multi-sensor fusion patents; a development phase (2019–2022) where deep learning architectures became dominant; and an acceleration phase (2023–2026) featuring system-level deployment and hardware integration.

In this dataset, 8 distinct assignees are represented across 6 jurisdictions. Innovation is moderately distributed in retrieved records — no single assignee spans all application areas — but SRI International and Momenta (Suzhou) Technology each hold 3 active patent records, reflecting sustained portfolio building in this dataset.

PatSnap Eureka Filing counts derived from retrieved patent records in PatSnap Eureka; this snapshot does not represent total global filings for each assignee.Explore the data ↗
Patent Data Analysis

Technology Clusters and Filing Timeline in the Dataset

The retrieved records distribute across four technology clusters — vibration-to-image classification, spatiotemporal video monitoring, physics-guided embedding fusion, and BEV-based multi-modal fusion — with filing activity concentrated in 2019–2026.

Patent Records by Technology Cluster (Dataset Snapshot)

In this dataset, BEV-based multi-modal fusion and vibration-to-image classification each account for the largest patent and literature record concentrations, with physics-guided embedding and spatiotemporal video monitoring representing smaller but sustained clusters.

Patent records by technology cluster in dataset: BEV Fusion 5, Vibration-to-Image 5, Physics-Guided Embedding 3, Spatiotemporal Video 3Horizontal bar chart of record counts per technology cluster in the retrieved dataset. Source: PatSnap Eureka retrieved records.Records by Technology Cluster (Dataset Snapshot)BEV Multi-Modal Fusion5Vibration-to-Image Classification5Physics-Guided Embedding3Spatiotemporal Video Monitoring3↗ Click bars to explore

Patent Filing Activity by Phase, 2013–2026 (Retrieved Records)

In this dataset, the acceleration phase (2023–2026) contains the highest concentration of commercial patent filings, with Momenta, Jiangsu University, Tongji University, and Qualcomm all filing between 2024 and 2026 — compared to 2 foundational records before 2019.

Filing activity by phase: Foundational 2013-2018: 3 records, Development 2019-2022: 9 records, Acceleration 2023-2026: 14 recordsVertical bar chart showing count of retrieved patent and literature records per filing phase. Source: PatSnap Eureka retrieved records.Filing Activity by Phase (Retrieved Records)15105032013–201892019–2022142023–2026↗ Click bars to explore
PatSnap Eureka Record counts represent documents retrieved in this PatSnap Eureka dataset snapshot and do not reflect total global publication volumes.Explore the data ↗
Application Domains

Key Application Domains in Vision-Vibration Fusion Diagnosis

The retrieved records span five distinct application domains — from industrial machinery fault diagnosis and structural health monitoring to autonomous driving perception, smart building occupant sensing, and multimodal medical diagnosis — each drawing on overlapping deep learning fusion architectures.

DWT · CWT · ViT · VGG16

Industrial Machinery Fault Diagnosis

The most technically concentrated domain in this dataset, covering bearing and gearbox diagnostics. A Multi-Information Fusion ViT Model (2023) achieved 99.85% average accuracy on small-sample bearing datasets using multi-scale DWT decomposition and CWT maps. Multi-Source SDP and VGG16 fusion for gearboxes (2022) demonstrated pre-fusion accuracy of 93–96% per sensor, with post-fusion performance exceeding individual sensor results via Dempster-Shafer evidence theory.

Vibration-to-Image Classification
3DCNN · ConvLSTM · RGB-D · FFT

Structural Health Monitoring

Non-contact video-based vibration sensing was established in the 2013 foundational paper using pixel-intensity FFT from fixed digital camera frames for frequency extraction. The 2020 3DCNN-ConvLSTM system extended this using a Microsoft Kinect v2 RGB-D camera for low-frequency environmental vibration monitoring under unstable ambient light conditions, combining short-term 3D CNN and long-term ConvLSTM spatiotemporal features.

Spatiotemporal Video Monitoring
BEV Fusion · LiDAR · Camera · Radar

Autonomous Driving Perception

The most patent-intensive application domain in the dataset, with Jiangsu University (2026, US pending) deploying a loosely coupled BEV fusion system for LiDAR, camera, and radar with cascade coupling for trajectory tracking. Momenta (Suzhou) Technology filed two active US patents (2023, 2026) covering radar-vision feature fusion and target matching. GM Global Technology Operations holds active US patents for multi-mode heterogeneous sensor fusion (2020) and radar-vision fusion (2019).

BEV Multi-Modal Fusion
AD-TCN · Wearable · Floor Vibration

Smart Buildings & Occupant Sensing

University of California filed a pending US patent in 2024 for cross-modal association between wearable biometric signals and structural floor-vibration sensor data using the AD-TCN (Association Distance Temporal Convolutional Network) architecture. The patent targets smart home, eldercare facility, and retail sensing environments, representing a direct extension of industrial fusion paradigms into inhabited built environments.

Cross-Modal Embedding Fusion
PatSnap Eureka Application domain assignments are based on PatSnap Eureka retrieved records and reflect the technology coverage of this dataset snapshot only.Explore insights ↗
Assignee Landscape

Key Patent Assignees in Vision-Vibration Fusion (Retrieved Records)

In this dataset, SRI International (US) and Momenta (Suzhou) Technology Co., Ltd. (CN/US) each hold 3 active patent records in retrieved records, representing the highest filing volumes among the 8 named assignees. In retrieved records, commercial entities concentrate in autonomous driving perception fusion while research institutions focus on cross-modal embedding frameworks.

Top Assignees by Filing Count in Retrieved Records (Dataset Snapshot)

Top assignees in dataset: SRI International 3, Momenta (Suzhou) Technology 3, GM Global Technology Operations 2, Apple Inc 2Horizontal bar chart of filing counts per named assignee in the retrieved dataset. Source: PatSnap Eureka.SRI International3Momenta (Suzhou)Technology Co., Ltd.3GM Global TechnologyOperations LLC2Apple Inc.2↗ Click bars to explore
Physics-Guided Embedding · Cross-Modal Fusion

SRI International

SRI International holds 3 active patent records in this dataset spanning WO 2021, US 2023, and US 2025, all covering physics-guided deep multimodal embeddings for task-specific data exploitation. The core invention constructs a common embedding space where sensor-data-specific neural networks produce modality vectors and cross-modal related vectors are mapped closer together. All three records are active, representing a sustained and expanding IP portfolio in cross-modal feature alignment applicable to any multi-sensor diagnostic system incorporating physics priors.

United States
Camera-Radar Vision Fusion · Autonomous Driving

Momenta (Suzhou) Technology

Momenta (Suzhou) Technology Co., Ltd. holds 3 active patent records in this dataset: EP 2023, US 2023, and US 2026, all covering camera-radar vision fusion architectures for autonomous driving perception. The 2026 US patent covers fusing vision perception and radar perception features through a radar-vision feature fusion model with cross-modality target matching, and is currently active. The EP 2023 and US 2023 filings address multi-sensor data fusion apparatus and method, reflecting a parallel international filing strategy.

China — CN / United States
🔍
Unlock all 8 assignee profiles and jurisdiction breakdowns
Additional assignees in retrieved records include Qualcomm Incorporated (adaptive BEV fusion, US 2024), Robert Bosch GmbH (modality-spanning measurement fusion, US 2024), and Tongji University (radar-vision UAV tracking, US 2026 active). Full filing timelines, patent status, and freedom-to-operate signals are available in PatSnap Eureka.
Qualcomm BEV Fusion Tongji University UAV Radar + more
Unlock full assignee analysis →
PatSnap Eureka Assignee data sourced from PatSnap Eureka retrieved patent records; counts reflect this dataset snapshot only.Explore players ↗
Emerging Directions

Forward-Looking Technology Signals (2024–2026)

The most recent filings in the dataset (2024–2026) reveal five forward-looking directions: BEV unification of heterogeneous sensor streams, cross-modal association networks for built-environment sensing, adaptive online fusion with distribution-shift compensation, multimodal digital twins with visual-haptic-vibration fusion, and intelligent temporal multimodal encoding.

BEV Unification of Heterogeneous Sensor Streams

Converting LiDAR, camera, and millimeter-wave radar into Bird’s-Eye View feature spaces is becoming a standard architectural pattern in the 2025–2026 filings. Jiangsu University (2026, US pending) and Momenta (Suzhou) Technology (2026, US active) both deploy cascade coupling for trajectory-level diagnosis within unified BEV feature representations. This pattern is architecturally transferable from autonomous driving to industrial vibration-scene monitoring.

Cross-Modal Association Networks for Built-Environment Sensing

University of California’s 2024 pending US patent introduces the AD-TCN (Association Distance Temporal Convolutional Network) architecture for aligning wearable biometric signals with structural floor-vibration sensor data. This extends the fusion paradigm from industrial machinery to inhabited buildings, targeting smart home, eldercare facility, and retail sensing environments. This is the only retrieved patent directly claiming cross-modal alignment between structural vibration and wearable signals.

🔒
Unlock emerging directions 4 and 5 from the 2024–2026 dataset
The dataset also surfaces multimodal digital twin fusion combining visual and haptic modalities for surface material recognition (2024 literature), and a CN-jurisdiction pending patent targeting intelligent temporal multimodal encoding via dynamic sliding windows — both with direct applicability to continuous machine health tracking.
Digital Twin Visual-Haptic FusionTemporal Sliding Window Encoding+ more
Unlock full analysis →
PatSnap Eureka Emerging direction signals derived from 2024–2026 patent filings and literature in PatSnap Eureka retrieved records.Explore emerging trends ↗
Architecture Comparison

Vibration-to-Image Classification vs. BEV Multi-Modal Fusion

Click any row to explore further.

DimensionVibration-to-Image ClassificationBEV Multi-Modal Fusion
Vibration / Acoustic signals converted to TFR imagesLiDAR point clouds, camera images, millimeter-wave radarN/A
CWT / DWT time-frequency maps, SDP patterns, PSD energy mapsUnified Bird’s-Eye View (BEV) feature spaceN/A
ViT (Vision Transformer), VGG16 convolutional networksCascade coupling BEV encoder, radar-vision feature fusion modelN/A
99.85% average on small-sample bearing fault datasets (2023 ViT paper)System-level deployment metrics; trajectory tracking validated in 2026 patentsN/A
Industrial machinery: bearings, gearboxes (aerospace, wind energy)Autonomous driving perception, UAV cluster surveillanceN/A
Academic literature (unassigned); SRI International for embedding layerMomenta (Suzhou) Technology, Jiangsu University, GM Global Tech Ops, QualcommN/A
Primarily literature-stage; patent protection sparse on TFR+ViT pipelineMultiple active US patents filed 2023–2026 by commercial entitiesN/A
Approaching commodity accuracy; differentiation lies in small-sample and variable-condition robustnessDistribution-shift between training BEV features and real-time sensor data (addressed by Qualcomm 2024)N/A
PatSnap Eureka Comparison based on patent claims and literature findings retrieved in PatSnap Eureka dataset snapshot; not a comprehensive market benchmarking study.Compare in Eureka ↗
Frequently asked questions

Frequently Asked Questions: Vision-Vibration Fusion Diagnosis Patents

Still have questions? PatSnap Eureka can answer them instantly from patent and research data.Ask Eureka ↗
PatSnap Eureka

Generate Your Multimodal Fusion Patent Report in PatSnap Eureka

Join 18,000+ innovators using PatSnap Eureka to generate reports like this one for any technology area.

Data and insights on this page are based on a limited patent and literature dataset and are for reference only. Figures may not represent the complete technology landscape.

Powered by PatSnap Eureka
Link copied to clipboard

Help us improve this page

Found incorrect or outdated information? Let us know and we'll get it fixed.