Why AI Drug Repurposing Has Become Unavoidable
AI-powered drug repurposing is now a strategic necessity rather than an experimental curiosity, driven by a simple economic reality: de novo drug discovery averages $1–2.6 billion in cost and 10–15 years per approved compound. Repurposing approved or investigational drugs against new indications reduces regulatory risk, shortens the path to clinical trials, and leverages pharmacological data that already exists — without requiring new synthesis.
The technology landscape dataset spans publications from 2010 to 2023, with over 80% of records published between 2019 and 2023 — a signal of a field in rapid, sustained expansion. The COVID-19 pandemic acted as a forcing function: the urgent need to identify existing drugs active against SARS-CoV-2 validated computational repurposing as a credible, fast-cycle methodology at scale, and the tooling produced during that period — including SAveRUNNER, CoREx, CATNIP, and GDRnet — has since become disease-agnostic infrastructure.
AI-powered drug repurposing uses computational methods to identify new therapeutic indications for existing approved drugs, reducing regulatory risk and shortening the path to clinical trials compared to de novo drug discovery, which averages $1–2.6 billion and 10–15 years per approved compound.
The foundational infrastructure underpinning all AI repurposing methods consists of curated biomedical databases. Among the retrieved records, at least six databases are specifically designed to support repurposing workflows — including DrugBank, DrugCentral, PROMISCUOUS, DTC, repoDB, and DrugSig — alongside at least 12 distinct computational platforms or web servers. The predictive performance of every AI model in this landscape is directly dependent on the completeness and curation quality of these underlying datasets. As noted by WHO and bodies such as NIH, data quality and standardisation remain primary barriers to translating computational predictions into clinical practice.
Drug repurposing — also called drug repositioning — is the explicit reuse of approved or investigational compounds for new disease indications. It is distinguished from classical drug discovery by bypassing the need for de novo synthesis, enabling faster and lower-cost routes to clinical trials. AI-powered repurposing applies machine learning, network biology, and knowledge graph methods to predict novel drug-disease associations computationally.
The Four AI Method Clusters Driving Discovery
Four distinct methodological clusters define the current AI drug repurposing landscape, each exploiting different data modalities and computational architectures. Network-based methods represent the most populated cluster in the dataset, but deep learning, knowledge graph completion, and transcriptomic signature matching each address complementary aspects of the drug-disease prediction problem.
Cluster 1: Deep Learning for Drug-Target Interaction Prediction
Deep learning architectures — including convolutional neural networks, recurrent neural networks, and transformer-based models — encode molecular structures (SMILES strings) and protein sequences to predict binding interactions. Harvard University’s DeepPurpose (2020) is the landmark platform in this cluster, implementing 15 compound and protein encoders with over 50 neural architectures and demonstrating state-of-the-art performance on drug-target interaction benchmarks. PharmaNet (Universidad de los Andes, 2021) applies recurrent neural networks to active molecule prediction across 102 targets in the DUD-E database, while Hunan University’s 2022 systematic guideline covers both sequence-based and graph-based deep learning representations for repurposing.
Cluster 2: Network Medicine and Graph-Based Inference
Network-based inference is the most represented cluster in the dataset. These methods construct heterogeneous biological networks — with drug, gene, disease, and pathway nodes — and apply graph algorithms, matrix factorisation, or graph neural networks to identify drug-disease associations via network proximity or community structure. The landmark example is Project Rephetio / Hetionet (University of Pennsylvania, 2017), which integrates 29 data sources into a network of 47,031 nodes and 2,250,197 relationships, predicting treatment probability for 209,168 compound-disease pairs. GDRnet (Indian Institute of Science, Bangalore, 2022) frames repurposing as a link prediction problem in a multi-layered heterogeneous network of approximately 1.4 million edges and 42,000 nodes representing drugs, diseases, genes, and anatomies.
“No individual algorithm — whether network proximity, diffusion, or machine learning — reliably outperforms across all validation datasets. Product strategies should favour ensemble and fusion architectures rather than single-method platforms.”
Cluster 3: Knowledge Graph Embedding and Completion
Knowledge graph completion methods apply neural embedding models — including TransE, RotatE, DistMult, and ComplEx — to predict missing drug-disease links in structured biomedical knowledge graphs. The University of Minnesota’s 2021 study on COVID-19 drug repurposing extracts semantic triples from PubMed via SemRep/SemMedDB, applies BERT-based accuracy filtering, and uses five knowledge graph completion algorithms in a time-sliced validation approach. Siemens AG (2021) contributes a task-driven filtering approach that addresses underfitting in standard embedding models when optimising specifically for drug-disease relation types. Georgetown University’s StarGazer platform (2022) combines human domain expertise with multi-omics data mining — spanning genomics, phenomics, and proteomics — in a singular numerical scoring system.
Explore the full AI drug repurposing patent and literature landscape in PatSnap Eureka.
Search Drug Repurposing Patents in PatSnap Eureka →Cluster 4: Transcriptomic Signature Matching and NLP-Based Discovery
Transcriptomic signature matching uses gene expression perturbation profiles from databases such as the Connectivity Map (CMap) and LINCS to identify drugs whose transcriptional signatures reverse disease-associated gene expression patterns. DrugSig (Shanghai High-Tech United Bio-Technological R&D Co., Ltd., 2017) curates drug response microarray data for over 1,300 drugs and 7,000 microarrays. Tongji University’s Dr. Sim (2021) introduces supervised similarity learning for CMap/LINCS transcriptional profiles, replacing unsupervised similarity metrics with trained models to improve robustness. Hokkaido University’s two-stage approach (2022) clusters 262 disease cases by UMAP-based gene expression dimensionality reduction, then assesses drug efficacy through expression reversibility.
Harvard University’s DeepPurpose (2020) implements 15 compound and protein encoders with over 50 neural architectures for drug-target interaction prediction, representing the leading deep learning platform for AI-powered drug repurposing in the dataset.
Application Domains: From COVID-19 to Oncology and Beyond
COVID-19 is the single largest application domain in the dataset, represented by at least 18 of the retrieved records — a concentration that reflects how the pandemic transformed AI drug repurposing from a methodological research area into an urgent applied discipline. Oncology is the second most prominent domain, with rare diseases, neurological conditions, and metabolic diseases representing underserved but high-value opportunities.
Infectious Disease: COVID-19 as the Proving Ground
The Harvard Medical School / Brigham and Women’s Hospital network medicine framework (2021) is the most comprehensive COVID-19 repurposing study in the dataset: it deploys a multimodal fusion of AI, network diffusion, and network proximity algorithms to rank 6,340 drugs against SARS-CoV-2, validating predictions against 918 experimentally screened compounds. The University of Cincinnati’s signature-based approach (2021) uses the LINCS library to identify a shortlist of 20 candidate drugs. The Medical University of Silesia (2021) applies supervised machine learning trained on in vitro antiviral data encoded in chemical fingerprints, identifying zafirlukast as a repurposing candidate.
Harvard Medical School’s COVID-19 AI drug repurposing framework ranked 6,340 drugs against SARS-CoV-2 using multimodal fusion of network proximity, network diffusion, and machine learning algorithms, validating predictions against 918 experimentally screened compounds.
Oncology: The Next Primary Focus
With COVID-19 urgency receding, the 2022–2023 records show a clear pivot back to cancer. University of Helsinki’s DrugRepo algorithm (2022) scores repurposing candidates for 669 diseases including 674 cancer types, integrating chemical structures, drug-target interactions, pathways, and disease-gene associations. Changsha University (2022) integrates gene expression, copy number variation, and DNA methylation to design cancer-specific pathway-based drug similarity metrics. Drug combination synergy — exemplified by DrugComb (University of Helsinki, 2021) — is emerging as a key sub-domain within oncology repurposing, according to research tracked by Nature.
Rare Diseases, Neurology, and Metabolic Conditions
Despite strong COVID-19 and oncology representation, only a small number of records target rare and orphan diseases or metabolic conditions — making these areas both underserved and strategically attractive. Tennessee Tech University (2022) develops machine learning models using drug-drug, drug-gene, drug-enzyme, and drug-target interactions to identify repositioning candidates for diseases with limited approved treatments. The University of Miami (2022) derives a clinical insulin resistance signature from over 1,700 human biopsies to identify more than 130 repositioning compounds targeting diabetes, dementia, and cardiovascular disease. Medical University of Warsaw’s Adera2.0 (2022) targets Alzheimer’s, Parkinson’s, multiple sclerosis, and depression through neural-network-augmented text mining of PubMed.
Despite the high unmet medical need, rare/orphan diseases and metabolic conditions (insulin resistance, neurodegeneration) have lower competitive density in the AI repurposing space than COVID-19 or oncology. The University of Miami study identified over 130 repositioning compounds from a single insulin resistance signature derived from more than 1,700 human biopsies — illustrating the scale of opportunity in this domain.
Identify drug repurposing opportunities across oncology, rare diseases, and metabolic conditions with PatSnap Eureka.
Explore Drug Repurposing Research in PatSnap Eureka →Geographic and Institutional Landscape
Innovation in AI drug repurposing is broadly distributed across academic and research institutions rather than concentrated in major pharmaceutical companies — a structural characteristic that has direct implications for how commercial value is captured and how partnerships are formed.
The United States leads with the highest representation in the dataset: Harvard University/Medical School (two records), University of Pennsylvania, University of Minnesota, University of Texas Health Science Center, Georgetown University, Weill Cornell Medicine, Vanderbilt University Medical Center, and Nationwide Children’s Hospital all appear. One record from Zhejiang University of Technology (2022) explicitly notes that “the United States leads in this area of research.” China is represented by Hunan University, Central South University (two records), Northwestern Polytechnical University, Shanghai Jiao Tong University, Tongji University, and Changsha University — reflecting institutional breadth rather than single-assignee concentration. Europe contributes meaningfully across Germany (Charité Berlin, Siemens AG), Italy (Sapienza University, University of Catania), Spain, Finland (University of Helsinki, three records), Austria (IMP Vienna), and Poland. India appears across IIT Roorkee, Indian Institute of Science Bangalore, AIIMS New Delhi, and DRDO.
Industrial presence is sparse across all geographies. Only Novartis Pharma AG (Basel), Siemens AG, GeneNet Pharmaceuticals (Tianjin), Pharnext (France), Interprotein Corporation (Japan), and ThinTek LLC (Palo Alto) represent non-academic entities in the dataset. This confirms that AI repurposing literature is predominantly academic, with industrial application occurring through structured partnerships. The Novartis-MIT challenge — involving 50+ cross-functional teams — improved prediction AUC from 0.78 to 0.88, demonstrating the value of this model. Standards bodies including ISO and regulatory frameworks from agencies such as the European Medicines Agency are increasingly relevant as AI repurposing predictions advance toward clinical validation.
The Novartis-MIT AI drug repurposing challenge involved over 50 cross-functional teams and improved drug approval prediction AUC from 0.78 to 0.88, illustrating that pharmaceutical companies are sourcing AI repurposing innovation through structured industrial-academic partnerships rather than fully internal development.
Emerging Directions and Strategic Implications
Six directional signals emerge from the most recent filings (2022–2023) in the dataset, each pointing toward where AI drug repurposing capability will concentrate over the next several years — and where strategic positioning will matter most.
Multi-Modal Fusion Is Replacing Single-Algorithm Approaches
The trend toward fusing multiple AI methodologies — network proximity, diffusion, and machine learning ensemble — is the clearest structural shift in the field. The Harvard/BWH COVID-19 framework and the Drug Repurposing Encyclopedia’s 198 million drug-signature associations across 20 organisms (IMP Vienna, 2023) both exemplify this consensus-based prediction approach. No individual algorithm reliably outperforms across all validation datasets; IP and product strategies should favour ensemble and fusion architectures.
Generative Design Is Converging with Repurposing
ReMODE (Zhejiang University, 2022), a deep learning-based web server for target-specific drug design, represents an emerging convergence of generative molecular design with repurposing workflows — enabling optimisation of drug-likeness of repositioned scaffolds rather than simply predicting new indications for existing structures. This convergence blurs the boundary between repurposing and de novo design.
Real-World Data Integration Is Accelerating Validation
Yonsei University’s 2021 study on real-world data use in drug repurposing, set in the context of the 21st Century Cures Act, signals that electronic health records and claims data are increasingly being integrated with AI repurposing pipelines to validate computational predictions against real patient outcomes — bridging the gap between algorithmic prediction and clinical evidence. This integration is consistent with trends tracked by organisations such as OECD in health data governance.
Transcriptomic Methods Are Shifting from Unsupervised to Supervised Learning
Dr. Sim (Tongji University, 2021) and Hokkaido University’s two-stage approach (2022) both signal a move from unsupervised to supervised similarity learning for transcriptional data. This directly addresses the high-dimensionality noise that has historically limited CMap/LINCS approaches, improving the robustness of signature-based repurposing predictions.
Network Tools Are Maturing into Deployable Software
SAveRUNNER’s evolution from a 2021 algorithm paper to an R-language package signals that network medicine repurposing methods are transitioning from proof-of-concept to reproducible, deployable tools. This maturation lowers the barrier to entry for new organisations and shifts competitive advantage from method novelty to data quality and clinical validation capability.
Strategic Implications for R&D Teams
- Database quality is the binding constraint. The predictive performance of all AI methods is directly dependent on the completeness and curation quality of underlying databases (DrugBank, LINCS, PROMISCUOUS, DTC). Curated interaction datasets should be treated as core IP assets, not just infrastructure.
- Rare disease and metabolic disease represent underserved, high-value opportunities. Despite strong COVID-19 and oncology representation, only a small number of records target rare/orphan diseases or metabolic conditions. These areas have high unmet medical need and lower competitive density in the AI repurposing space.
- Industrial-academic partnerships are the dominant commercialisation model. Pharmaceutical companies are sourcing AI repurposing innovation through structured external collaboration rather than fully internal development, creating partnership and licensing opportunities for academic groups with validated platforms.
- COVID-19 tooling is now disease-agnostic infrastructure. Platforms including CoREx, SAveRUNNER, CATNIP, and GDRnet — produced under pandemic urgency — are available as open-source tools, allowing entering organisations to reduce build costs significantly.
This technology landscape is derived from a limited set of patent and literature records retrieved across targeted searches. It represents a snapshot of innovation signals within this dataset only and should not be interpreted as a comprehensive view of the full industry. All statistics and claims are sourced from the retrieved records as described.
For R&D teams and IP strategists tracking this space, PatSnap’s life sciences intelligence platform provides access to the full patent and literature landscape across AI drug repurposing, enabling teams to map competitor activity, identify white spaces, and monitor emerging method clusters in real time. The PatSnap Insights blog publishes regular technology landscape analyses across pharma, biotech, and computational biology.