Book a demo

Cut patent&paper research from weeks to hours with PatSnap Eureka AI!

Try now

Searching Patents for Chemically Modified Sequences | Patsnap

Patsnap Team

Searching patents for chemically modified sequences—oligonucleotides with phosphorothioate linkages, GalNAc conjugates, locked nucleic acids (LNA), or methylated bases—is among the most technically demanding tasks in biopharma IP. Unlike small molecules with defined structures or native biologics with linear sequences, chemically modified sequences combine structural complexity with representational inconsistency across patent documents. This creates significant FTO risk and makes prior art searches time-consuming and error-prone.

The complexities inherent in chemically modified sequences, such as those found in cutting-edge oligonucleotide therapeutics, gene therapies, or mRNA platforms, are widely acknowledged across the biopharma industry as critical challenges for intellectual property protection and freedom-to-operate analyses.

Searching patents for chemically modified sequences involves leveraging AI-native intelligence platforms capable of multi-modal search, including Optical Chemical Structure Recognition (OCSR) and Named Entity Recognition (NER), to extract, normalize, and analyze complex modification patterns across patent documents. This approach transforms a manual, error-prone process into a comprehensive and traceable analysis.

For IP professionals supporting oligonucleotide therapeutics, gene therapies, or mRNA platforms, the stakes are high: missing a single relevant patent claim covering a modified sequence can derail licensing negotiations, trigger costly litigation, or force program pivots late in development. This guide walks through the technical challenges of **searching patents for chemically modified sequences** in patent databases—and shows how AI-native intelligence platforms, like Patsnap Eureka Life Science, are transforming what was once a manual, weeks-long process into traceable, comprehensive analysis.

Why Is Searching Chemically Modified Sequences So Difficult?

Chemically modified sequences pose three core challenges that traditional patent search tools struggle to address:

  • Representational inconsistency: Modified sequences are depicted using HELM notation, ASCII strings, image-based structures, or plain-language descriptions—often within the same patent. There is no universal standard for representing backbone modifications, sugar modifications, or conjugates across patent offices.
  • Positional specificity: A phosphorothioate linkage at position 3 versus position 5 can alter patentability and FTO risk, but most text-based search engines cannot distinguish positional modifications with precision.
  • Combinatorial claim structures: Patent claims often describe chemically modified sequences as Markush-style variants with modular components (e.g., “wherein the sequence comprises 1–5 phosphorothioate linkages at positions selected from…”). Evaluating novelty requires reconstructing all plausible embodiments.

Manual approaches—downloading PDFs, extracting sequences by hand, comparing modifications across claims—are slow, inconsistent, and nearly impossible to scale when evaluating hundreds of patents during FTO or prior art searches.

Step 1: How to Define the Scope of Your Modified Sequence Search?

Before querying any database, clarify the search parameters for your chemically modified sequence:

  • Base sequence identity: Are you searching for exact sequence matches, or do you need to account for mutations, truncations, or motif conservation?
  • Modification type: Backbone modifications (phosphorothioate, phosphorodiamidate), sugar modifications (2′-O-methyl, 2′-fluoro, LNA), base modifications (5-methylcytosine), or conjugates (GalNAc, cholesterol, peptides)?
  • Positional constraints: Does the modification pattern matter? Are specific positions functionally critical?
  • Claim scope: Are you assessing exact infringement, or evaluating broader patent families that might cover structural analogs or alternative chemistries?

Document these parameters clearly. They will determine whether you need sequence homology searches, substructure searches, or full-text mining of patent claims and experimental examples.

Step 2: Query Patent Databases with Multi-Modal Search Capabilities

Traditional keyword searches and BLAST-based sequence searches are insufficient for chemically modified sequences. You need a platform capable of:

  • Optical Chemical Structure Recognition (OCSR): Converting structure images—including modified nucleotides depicted as chemical diagrams—into machine-readable formats for comparison
  • Named Entity Recognition (NER) for biosequences: Extracting sequences and their associated modifications from unstructured patent text, tables, and example sections
  • Multi-modal data integration: Linking sequence data with modification annotations, experimental activity (IC50, Kd), and claim language across the same document

Patsnap Eureka Life Science’s Lead Compound Analyzer is purpose-built for this challenge. With OCSR precision at 95.5% and NER accuracy exceeding 88%, it processes patents up to 1,000 pages in length, extracting chemically modified sequences alongside SAR data, biological activity, and claim scope analysis. The platform’s 1.44 billion biosequence dataset and 18.2 million patents enable comprehensive prior art searches across siRNA, ASO, mRNA, and other nucleic acid modalities—without requiring manual extraction or normalization.

See how Lead Compound Analyzer handles complex modified sequence searches—book a demo with our team.

Step 3: Extract and Normalize Modification Data Across Patent Families

Once you’ve identified relevant patents, the next step is extracting modification-specific data and normalizing it for comparison. This includes:

  • Cataloging all disclosed modified sequences and their associated experimental data (binding affinity, knockdown efficiency, in vivo stability)
  • Mapping positional modifications to biological activity to understand structure-activity relationships
  • Identifying Markush claim structures and enumerating plausible embodiments that overlap with your candidate
  • Flagging first-public disclosures and tracking compound evolution across continuations and divisionals

Manually, this process can take days per patent. Patsnap Eureka Life Science’s Document Analyzer automates scenario-based extraction across multiple patents in parallel, saving approximately 80% of document reading time. Its SAR batch extraction feature is especially valuable for oligonucleotide patents: it extracts structure-activity relationship data, performs scaffold analysis and R-group decomposition, and generates activity cliff visualizations—all while maintaining full source traceability back to the original patent text.

Step 4: How to Assess FTO Risk and Patent Scope for Modified Sequences?

With structured modification data in hand, the final step is evaluating freedom-to-operate risk and understanding the enforceable scope of relevant claims. Key questions include:

  • Do any active patents claim your exact modified sequence or a genus that encompasses it?
  • Are there patents claiming the modification chemistry itself (e.g., phosphorothioate linkages at specific positions) independent of the base sequence?
  • What is the geographic coverage of these patents, and when do they expire?
  • Are there ongoing oppositions, invalidations, or litigation that affect enforceability?

Lead Compound Analyzer’s patent scope and claim analysis capabilities support inventiveness assessment and FTO insights by connecting extracted sequences and modifications directly to claim language. This enables IP professionals to quickly assess whether a candidate falls within prior art, requires licensing, or offers genuine patentable novelty.

How Patsnap Accelerates Modified Sequence Patent Search

For IP professionals evaluating oligonucleotide, gene therapy, or mRNA pipelines, Patsnap’s Eureka Life Science platform transforms chemically modified sequence searches from a bottleneck into a scalable, traceable workflow:

  • Full-patent AI mining: Reads and extracts data from patents up to ~1,000 pages, capturing sequences, modifications, and experimental data from examples, claims, and tables
  • Multi-modal extraction: OCSR + NER + LLM pipeline extracts sequences depicted as images, text, or structured notation
  • Cross-document comparison: Analyze modification patterns across dozens of patents in parallel, identifying consensus structures and FTO gaps
  • Traceable outputs: Every extracted sequence, modification, or activity value is linked back to its source patent and paragraph

These capabilities are backed by 1.44 billion biosequences, 270 million chemical structures, and 18.2 million patents—covering the full spectrum of nucleic acid therapeutics and their chemical modifications.

Make Chemically Modified Sequence Searches Faster and More Defensible

Searching patents for chemically modified sequences no longer requires weeks of manual PDF review and spreadsheet wrangling. AI-native platforms like Patsnap Eureka Life Science enable IP professionals to conduct comprehensive, traceable prior art and FTO searches in a fraction of the time—while reducing the risk of missing critical modifications buried in dense claim language or experimental tables.

Whether you’re supporting siRNA lead optimization, conducting FTO for an mRNA vaccine candidate, or evaluating licensing opportunities in gene therapy, Patsnap’s Lead Compound Analyzer and Document Analyzer deliver the precision, speed, and traceability your team needs to make defensible IP decisions with confidence.

Ready to see how Patsnap handles your toughest modified sequence search challenges? Book a demo with our team and get a live walkthrough of Lead Compound Analyzer’s multi-modal extraction, claim analysis, and FTO capabilities tailored to nucleic acid therapeutics.

Frequently Asked Questions

Can Patsnap search for specific modification patterns like phosphorothioate linkages at defined positions?

Yes. Lead Compound Analyzer’s NER and OCSR engines extract positional modification data from patent text, tables, and structure images. This enables precise searches for modification patterns—such as phosphorothioate linkages at positions 1–3 or 2′-O-methyl modifications on specific nucleotides—across Patsnap’s 1.44 billion biosequence database.

How does Patsnap handle patents that describe modified sequences using Markush structures?

Document Analyzer and Lead Compound Analyzer extract Markush claim language and associated sequence embodiments, enabling you to enumerate plausible variants and assess overlap with your candidate. Patent scope analysis tools help evaluate the enforceable breadth of genus claims covering chemically modified sequences.

What types of chemical modifications are covered in Patsnap’s biosequence database?

Patsnap covers backbone modifications (phosphorothioate, phosphorodiamidate), sugar modifications (2′-O-methyl, 2′-fluoro, LNA), base modifications (5-methylcytosine, pseudouridine), and conjugates (GalNAc, cholesterol, peptides). The platform supports siRNA, ASO, mRNA, aptamers, and other nucleic acid modalities with chemical modifications.

How long does it take to conduct an FTO search for a chemically modified oligonucleotide using Patsnap?

Lead Compound Analyzer processes patents up to 1,000 pages and extracts sequences, modifications, and claim data in minutes—not days. Document Analyzer’s batch extraction saves approximately 80% of document reading time, enabling comprehensive FTO searches across dozens of patents in hours rather than weeks.

Does Patsnap support searching for modified sequences in patents from all major patent offices?

Yes. Patsnap’s 18.2 million patent database includes records from USPTO, EPO, WIPO, JPO, and other major patent offices. The platform’s multi-modal extraction works across jurisdictions, handling varying sequence representation formats and claim structures used in different filing regions.

Can I trace extracted modification data back to the original patent source?

Absolutely. Every extracted sequence, modification annotation, and experimental data point is linked back to its source patent, paragraph, and claim. This source traceability is essential for defensible FTO analysis, prior art citations, and internal IP review processes.

Your Agentic AI Partner
for Smarter Innovation

Patsnap fuses the world’s largest proprietary innovation dataset with cutting-edge AI to
supercharge R&D, IP strategy, materials science, and drug discovery.

Book a demo