From Academic Tool to Industrial Workflow: The 2026 Inflection Point
AI-accelerated drug discovery has crossed a threshold: in silico approaches that spent the 2010s as research curiosities are now production-scale components of pharmaceutical R&D pipelines. The core economic rationale, articulated in a 2022 University of Southern California literature record included in this dataset, is that in silico drug discovery is becoming mainstream because computational screening can narrow experimental compound libraries before synthesis—compressing cost and timelines at the earliest, most uncertain stages of the pipeline.
This landscape is derived from a targeted set of patent and literature records retrieved across searches spanning 2017–2026. It represents a snapshot of innovation signals within this dataset only and should not be interpreted as a comprehensive view of the full industry. What it does reveal, however, is a clear maturation arc: from precursor computational chemistry filings in 2013–2016, through a knowledge-graph and network-based repurposing phase in 2017–2020, to a peak filing cluster in 2021–2023, and into a frontier wave of LLM-native and generative AI architectures in 2024–2026.
The earliest relevant filings date to 2013–2016 and establish conventional computational chemistry foundations—Novartis AG’s pyrimidine derivative patents (ES, 2013) and Pfizer Inc.’s pyrrolopyrimidine work (ES, 2016). Korea Institute of Science & Technology Information (KISTI) filed technology roadmap and future technology valuation methods as early as 2012–2013. The 2021–2023 cluster is the densest in this dataset, with Peptilogics (US) filing a four-patent family on its AI engine architecture and multiple academic institutions filing multi-modal drug-target interaction systems. The most recent filings—from Jiangnan University (CN, 2026), SoftBank Group (JP, 2025), and NYU (JP, 2025)—signal a shift toward large language model-native architectures and generative AI applied not just to molecules, but to regulatory documentation.
AI-accelerated drug discovery spans six technically distinct sub-domains: generative molecular design, drug-target interaction (DTI) and binding affinity prediction, drug repurposing via knowledge graphs and network pharmacology, genomic and multi-omics response prediction, clinical trial success prediction and design, and AI-assisted pharmaceutical development protocols.
According to WIPO, AI-related patent filings in the life sciences have grown substantially over the past decade, with drug discovery representing one of the highest-concentration application domains. The filing patterns in this dataset align with that broader trajectory—peak activity in 2021–2023, followed by a frontier wave that reflects the mainstreaming of transformer architectures and large language models across scientific disciplines.
Six Technical Clusters Defining the AI Drug Discovery Stack
The AI drug discovery patent landscape organises into six technically distinct clusters, each addressing a different computational bottleneck in the pipeline from target identification to clinical candidate selection. Understanding where each cluster sits—and how they interact—is essential for R&D teams assessing white space and IP strategists evaluating freedom to operate.
The field relies on transformer-based language models for molecular SMILES encoding, graph neural networks (GNNs) for molecular feature extraction, variational autoencoders (VAEs) for latent-space molecule generation, biomedical knowledge graphs linking drug-gene-disease entities, and reinforcement learning for compound optimization.
Cluster 1: Generative Molecular Design
Generative architectures—including VAEs, GANs, and knowledge graph-augmented generators—create novel molecular structures optimised for specific biological targets. Peptilogics’ multi-module AI engine (US, 2021–2025) is the most concentrated family in this dataset: a “creator” module for sequence generation, a “descriptor” module for knowledge-graph-based structural and activity representation, and a “scientist” module for benchmark-driven parameter optimisation. NYU’s BioMolAI system (JP, 2025) integrates a ProfileVAE that encodes gene expression in latent space and a MolVAE that generates molecular structures from that latent encoding, with iterative Tanimoto similarity scoring. Insilico Medicine’s GENTRL approach (CN, 2022) uses tensor reinforcement learning with Sammon mapping for chemical space visualisation.
Cluster 2: Drug-Target Interaction (DTI) and Binding Affinity Prediction
DTI prediction models are the core computational screen replacing high-throughput wet-lab assays. Deargen’s three-network architecture (KR, 2023; JP, 2024) applies cross-attention between independently extracted drug and target feature representations before predicting affinity. Chongqing University’s approach (CN, 2024) employs multi-kernel convolutional feature extraction at three scales with cross-scale and cross-modal attention modules to capture key binding sites. The frontier as of early 2026 is Jiangnan University’s LLM-native DTI architecture (CN, 2026), which integrates K-BERT for drug SMILES deep feature extraction, ProstT5 for protein sequence analysis, and Kolmogorov-Arnold Networks (KAN) for classification—the most recent LLM-native DTI architecture in this dataset.
“DTI prediction is approaching commoditization in standard architectures; the technical frontier as of early 2026 is at LLM/KAN integration and cross-scale/cross-modal attention—CNN/RNN-based DTI systems may no longer represent patentable advances.”
Cluster 3: Drug Repurposing via Knowledge Graphs
Knowledge graph-based repurposing systems mine biomedical networks linking drugs, genes, proteins, and diseases to identify new therapeutic uses for approved or late-stage compounds—bypassing early safety work and accelerating time-to-clinical candidate. Wipro’s system (EP/US, 2023) extracts protein-protein interaction data via NLP, generates a semantic knowledge graph, and integrates clinical trial data for final ranking. Seoul National University’s approach (KR, 2024) performs teleportation-induced random walks in a drug-gene-disease heteroentity knowledge graph to generate node embeddings for drug-disease association prediction.
Drug repurposing via AI knowledge graphs accelerates time-to-clinical candidate by identifying new therapeutic uses for approved or late-stage compounds, bypassing early safety work. Multiple assignees—including Wipro Limited, Seoul National University, Medirita, KAIST, and Kalisi—have filed on biomedical knowledge graph construction and traversal methods, making this a crowded area of prior art as of 2026.
Clusters 4–6: Omics, Clinical AI, and Protocol Generation
SYNTEKABIO’s CDRscan system (KR/US, 2019) predicts cancer drug response by fusing genetic variation fingerprints with molecular drug profiles via deep learning—an early commercial entrant in multi-omics integration. Korea University’s approach (KR, 2021) generates drug and cell-line embedding vectors from structural and genomic information respectively, placing both in a shared vector space to predict genomic expression response. At the clinical layer, Immunobiome’s target gene-based clinical trial success rate prediction model (JP, 2024) directly addresses clinical attrition by predicting trial failure probability from target gene profiles. SoftBank Group’s generative AI system (JP, 2025) applies a dual-LLM architecture to automate pharmaceutical development test protocols and reports—extending AI from discovery into regulatory affairs documentation.
Explore the full patent landscape for AI drug discovery technology in PatSnap Eureka.
Explore AI Drug Discovery Patents in PatSnap Eureka →Geographic and Assignee Landscape: Korea Leads, China Accelerates
South Korea is the dominant filing jurisdiction in this dataset with approximately 35 records, spanning academic institutions (Seoul National University, Korea University, KAIST, POSTECH), government bodies (KISTI), and commercial entities (SYNTEKABIO, Deargen, Medirita, BNJ Biopharma). Japan follows with approximately 12 records, many of which are foreign filings by Korean and Chinese institutions seeking JP jurisdiction coverage. China and the United States each contribute approximately 8 records, with Chinese filings accelerating notably in the 2024–2026 window.
Among AI drug discovery patent records retrieved spanning 2017–2026, South Korea accounts for approximately 35 filings, Japan approximately 12, China approximately 8, and the United States approximately 8, with around 10 additional records from other jurisdictions including India, Europe, Canada, and Brazil. US-based commercial entities show the deepest family portfolios, while Korean academic and government institutions contribute the highest volume of distinct technical approaches.
Among individual assignees, Peptilogics, Inc. (US) has the most concentrated portfolio with 5 filings (2021–2025), all focused on its knowledge-graph AI engine architecture for candidate drug generation. SYNTEKABIO Co., Ltd. (KR/US) has 3 filings (2019–2022) covering CDRscan drug response prediction and neoantigen immunotherapy prediction. Medirita Co., Ltd. (KR) also has 3 filings (2020–2021) on knowledge network and multi-omics candidate derivation. US-based commercial entities including Peptilogics and ABSCI show the deepest family portfolios, while Korean academic and government institutions contribute the highest volume of distinct technical approaches across the dataset.
The geographic distribution has direct implications for IP strategy. As noted by EPO in its annual patent index, Asia-Pacific jurisdictions have become increasingly important prosecution targets for life sciences innovation. IP strategists should prioritise KR, JP, and CN prosecution for any platform claiming DTI prediction, drug repurposing, or molecular generation—given the density of prior art and the concentration of competitive assignees in those jurisdictions.
Where the Technology Is Being Applied: Oncology, Immunology, and Beyond
Oncology is the largest application cluster in this dataset by a clear margin. POSTECH’s co-essentiality network for anticancer drug derivation (JP, 2024–2025), SYNTEKABIO’s CDRscan for cancer drug response (KR/US, 2019), and SYNTEKABIO’s neoantigen immunotherapy prediction system (KR, 2022) all address oncology specifically. Fujian Province Tumor Hospital’s network pharmacology method (CN, 2024) addresses the dual-indication challenge of treating breast cancer patients with co-morbid COVID-19—an example of AI repurposing applied to complex comorbidity scenarios.
Clinical trial success prediction (Immunobiome, JP 2024), AI-assisted trial design (Darwin Group, KR 2022), prescription verification (Innoverry, KR 2025), and protocol generation (SoftBank Group, JP 2025; Shanghai Zhihui Biopharma, CN 2025) represent a nascent but rapidly filling application layer extending AI from discovery into GCP and regulatory workflows. This layer currently has fewer competing filings than the molecular design layer.
In infectious disease and immunology, B.G. Negev Technologies’ efficacy prediction system using drug-drug interaction (DDI) embeddings (CA, 2023; US, 2025) was developed with anti-cancer drug identification as a primary use case. CureVac’s RNA-encoding-antibody patent family (US, 2020/2023) targets infectious disease and autoimmune applications. ABSCI Corporation’s generative AI for antibody design (BR, 2024) explicitly addresses novel antibody project initiation—an application domain where generative AI is beginning to challenge conventional hybridoma and phage display workflows.
In precision medicine, Immunobiome’s target gene-based clinical trial success rate prediction model (JP, 2024) directly addresses clinical attrition by predicting trial failure probability based on target gene profiles. Darwin Group’s clinical trial design support system (KR, 2022) provides AI-guided recruitment standards and protocol templates based on similar approved drugs. Shanghai Zhihui Biopharma’s AI-based clinical research protocol optimisation system (CN, 2025) uses stacked ensemble models—logistic regression, random forest, and linear regression as base models with a meta-model for integration—to optimise recruitment efficiency and reduce dropout rates.
Research published by Nature has documented the high attrition rates in clinical drug development, with failure rates exceeding 90% from Phase I through approval. The clinical AI applications in this dataset—particularly trial success prediction and protocol optimisation—directly target this structural inefficiency, representing a logical extension of the computational approaches that have already demonstrated value in earlier pipeline stages.
Analyse clinical trial AI patents and competitive assignee portfolios with PatSnap Eureka’s AI-powered search.
Search Clinical AI Patents in PatSnap Eureka →Frontier Signals: Five Emerging Directions from 2024–2026 Filings
The most recent filings in this dataset—spanning 2024–2026—reveal five directional signals that indicate where the technical frontier is moving. These are not incremental improvements on existing approaches; they represent architectural shifts that R&D and IP teams should monitor for both competitive intelligence and white-space opportunity.
1. LLM-Native DTI Architecture
Jiangnan University’s DTI prediction method (CN, filed February 2026) is the first retrieved filing explicitly applying Kolmogorov-Arnold Networks (KAN) alongside protein language models (ProstT5) and drug language models (K-BERT) within a unified DTI framework. This signals LLM-first architectures displacing earlier CNN/RNN approaches—a transition consistent with the broader pattern of transformer architectures superseding convolutional approaches across scientific machine learning, as documented by IEEE in recent computational biology surveys.
2. Disease-Conditioned Molecular Generation
NYU’s BioMolAI system (JP, 2025) integrates gene expression profiles and cell structure data to condition molecule generation on specific cellular environments—moving beyond target-based to context-aware generation. The system’s ProfileVAE encodes gene expression in latent space while MolVAE generates molecular structures from that encoding, with iterative Tanimoto similarity scoring and interpretability analysis built into the workflow.
3. Generative AI for Regulatory Documentation
SoftBank Group’s filing (JP, 2025) applies a dual-LLM architecture—a user-dedicated model combined with a general-purpose public-data-trained model—to automate pharmaceutical development test protocols and reports. This represents a novel application layer: using large language models not for molecule discovery but for regulatory affairs documentation, a workflow that has historically required extensive manual expert effort.
4. AI-Verified New Drug Prescription Systems
Innoverry’s system (KR, 2025) for AI-based new drug prescription verification using concomitant drug information from clinical trial subjects closes the loop between computational candidate generation and clinical prescribing safety—a translational application not present in earlier filings and one that bridges the discovery-to-prescribing pipeline in a single computational framework.
5. Synthetic Data Augmentation for DTI Model Training
Cyclica’s ghost-ligand synthetic data approach (JP, 2024) generates synthetic “ghost ligands” to augment training data for DTI models, directly addressing the data scarcity problem that limits deep learning-based drug discovery. This is a structural bottleneck across the field: proprietary synthetic augmentation methods may constitute high-value, hard-to-replicate IP precisely because they address a limitation that affects all competitors equally.
Jiangnan University’s DTI prediction method filed in February 2026 (CN) is the first retrieved filing explicitly applying Kolmogorov-Arnold Networks (KAN) alongside the protein language model ProstT5 and the drug language model K-BERT within a unified drug-target interaction framework, representing the most recent LLM-native DTI architecture in the dataset and signaling a shift away from CNN/RNN-based approaches.
Strategic Implications for IP Teams and R&D Leaders
The patent landscape in this dataset points to four actionable strategic conclusions for IP professionals and R&D leaders assessing competitive positioning in AI drug discovery.
Knowledge graph infrastructure is a defensible moat—but claims must be differentiated. Multiple assignees—Wipro, Seoul National University, Medirita, KAIST, Kalisi—are filing on biomedical knowledge graph construction and traversal methods. Teams entering this space should assess whether their graph architecture, including entity types, edge weights, and traversal algorithms, is sufficiently differentiated. Generic knowledge graph claims face crowded prior art as of 2026.
Standard DTI architectures are approaching commoditisation. Cross-attention and transformer-based DTI systems from Deargen, Chongqing University, and Jiangnan University are now common. The technical frontier as of early 2026 is at LLM/KAN integration and cross-scale/cross-modal attention. R&D teams should evaluate whether CNN/RNN-based DTI systems still represent patentable advances in light of the prior art density documented in this dataset.
KR, JP, and CN are the priority prosecution jurisdictions. With South Korea accounting for approximately 35 of the retrieved records and Japan and China each contributing a growing share, IP strategists should prioritise prosecution in these jurisdictions for any platform claiming DTI prediction, drug repurposing, or molecular generation. The PatSnap IP management platform provides jurisdiction-specific prosecution analytics to support these decisions.
The clinical translation layer has the most white space. Clinical trial success prediction, AI-assisted trial design, prescription verification, and protocol generation represent a nascent but rapidly filling application layer. This layer currently has fewer competing filings than the molecular design layer—making it the most strategically attractive area for teams looking to establish defensible IP positions in AI drug discovery as of 2026. The PatSnap R&D intelligence tools can help teams identify white space and monitor competitor filings in this emerging space.
“The clinical translation layer—trial success prediction, protocol generation, prescription verification—currently has fewer competing filings than the molecular design layer, making it the most strategically attractive area for new IP positions in AI drug discovery.”
Across all clusters, the data scarcity bottleneck identified by Cyclica’s synthetic augmentation approach is a structural challenge that affects every deep learning-based system in this landscape. Organisations that develop proprietary synthetic data generation methods—particularly for protein-ligand interaction data—may be building IP that is both high-value and structurally difficult for competitors to replicate, regardless of which specific DTI or generative architecture they ultimately adopt.