Book a demo

Patent Drafting Analysis of OpenAI OpCo, LLC’s Multimodal Machine Learning Model Interaction System | US 12,039,431 B1

Patent Drafting Analysis of OpenAI OpCo, LLC’s Multimodal Machine Learning Model Interaction System | US 12,039,431 B1
IP Drafting Analysis · US 12,039,431 B1

Patent Drafting Analysis of OpenAI OpCo, LLC's Multimodal Machine Learning Model Interaction System | US 12,039,431 B1

A structural and strategic analysis of OpenAI's granted patent covering GUI-based contextual prompt interaction with multimodal LLMs, examining claim architecture, drafting quality, critical gaps, and prosecution positioning across method and system claim types.

US 12,039,431 B1Filed: Sep 27, 2023Granted: Jul 16, 2024G06N 3/0455G06N 3/08
Spec Words
9,200
Across 6 sections
Draft now ↗
Total Claims
20
2 independent · 18 dependent
Draft now ↗
Figure Sheets
12
System architecture, UI flows, ML platform
Draft now ↗
Published by PatSnap Insights Team · · 12 min read Verified by PatSnap Eureka Data
Overview

Structural Overview

The detailed description dominates at approximately 63% of total words (~5,800 words), reflecting thorough operational scenario coverage across all six figure groups (FIGs. 3A–3F, 4, 5A–5B, 6A–6B, 7, 8). The claim set comprises exactly 20 claims — 2 independent (method Claim 1, system Claim 13) and 18 dependent — yielding a 9:1 dependent-to-independent ratio that provides layered fallback but concentrates risk on just two independent claims. Figure coverage spans 12 drawing sheets addressing UI states, process flows, hardware environments, and ML platform architecture, though no figure explicitly depicts the tokenization concatenation process that is central to Claims 2–5 and 14–16.

Section Word Distribution

Detailed Desc. 5800 w Claims 2300 w Summary 1160 w Background 700 w Brief Desc. 700 w Abstract 240 w ↗ Click bars to explore

Figure Inventory — 12 Sheets

FigureDescriptionRole
FIG. 1
System architecture showing Image Processing System (120), Machine Learning System (130), Network (110), and User Device (140) interconnected.Search in Eureka ↗
System architecture
FIG. 2
Flow diagram of Method 200 showing five sequential steps: Provide GUI (210), Receive Contextual Prompt (220), Generate Input Data (230), Generate Textual Response (240), Provide Textual Response (250).Search in Eureka ↗
Flow diagram
FIG. 3A
UI view (300a) showing Image 310 (a garden of herbs) displayed on a user device GUI before any contextual prompt is applied.Search in Eureka ↗
UI/interface
FIG. 3B
UI view (300b) showing Image 310 with a small Loupe annotation (320a) applied as a contextual prompt indicating a two-leaf area of emphasis.Search in Eureka ↗
UI/interface
FIG. 3C
UI view (300c) showing Image 310 with a Resized Loupe (320b) that now encompasses the whole plant, demonstrating the annotation resize feature.Search in Eureka ↗
Claim support
FIG. 3D
UI view (300d) showing Image 310 with a crudely drawn cross Mark (300a) as a contextual prompt indicating an area of emphasis.Search in Eureka ↗
UI/interface
FIG. 3E
UI view (300e) showing Image 310 with an unclosed loop Mark (300b) as a differently shaped contextual prompt annotation.Search in Eureka ↗
UI/interface
FIG. 3F
UI view (300f) showing Image 310 with a segmented object (340) highlighted as a contextual prompt, demonstrating segmentation-tool annotation.Search in Eureka ↗
Claim support
FIG. 4
Flow diagram of Method 400 showing the prompt suggestion generation loop: Obtain Contextual Prompt (410), Generate Input Data (420), Generate Prompt Suggestion (430), Display GUI with suggestions (440), branch to Receive Updated Prompt (450a) or Receive Selection (450b), Generate Response (460), Provide Response (470).Search in Eureka ↗
Flow diagram
FIG. 5A
UI view (500a) showing Image 310 displayed with two prompt suggestions (510a "What Plants are These?", 510b "What Greens Are Good for Salads?") and Textual Prompt Input Interface (520) when no contextual prompt has been applied.Search in Eureka ↗
UI/interface
FIG. 5B
UI view (500b) showing Image 310 with Resized Loupe (320b) applied and updated prompt suggestions (510c "What kind of basil is this?", 510d "Basil uses in cooking") conditioned on the loupe annotation.Search in Eureka ↗
Claim support
FIG. 6A
Split-screen UI view (600a) showing Image 310 with Mark 610 at one location alongside prompt suggestions (620a) "What kind of basil is this?" and "Basil uses in cooking", with Textual Prompt Input Interface 520.Search in Eureka ↗
UI/interface
FIG. 6B
Split-screen UI view (600b) showing Image 310 with Mark 610 moved to a different location generating new prompt suggestions (620b) "Why is cilantro soapy?" and "Recipes using cilantro", illustrating location-dependent prompt suggestions.Search in Eureka ↗
Claim support
FIG. 7
Block diagram of computing environment 700 showing Computing Device 702 with Memory 704, Processor 706, Data Storage 708, Other Hardware 710, User Interface 712, Network Interface 714, connected to I/O Devices 718 and Networks 716, with Configured Medium 720.Search in Eureka ↗
System architecture
FIG. 8
Block diagram of ML platform 800 showing Data Input Engine 810, Featurization Engine 820, ML Modeling Engine 830 (with Model Selector 832, Parameter Engine 834, Model Generation 836), Predictive Output Generation Engine 840, Output Validation Engine 850, Model Refinement Engine 860, Feedback Engine 870, Outcome Metric 880, and ML Algorithms Database 890.Search in Eureka ↗
System architecture
Analysis powered by PatSnap Eureka. Patent text and figures publicly available from USPTO. Draft a Similar Patent
Claims

Claim Architecture Analysis

The patent contains exactly 2 independent claims — method Claim 1 and system Claim 13 — providing dual claim-type coverage but no computer-readable medium (CRM) claim, which is a notable structural gap. The 18 dependent claims yield a 9.0:1 dependent-to-independent ratio, well above the software/AI industry norm of 4–8:1, creating extensive fallback positions. The symmetric structure mirrors Claims 2–12 (method dependents) against Claims 14–20 (system dependents), a deliberate prosecution strategy that maximises enforcement options across both claim types while using a single inventive concept.

Core inventive concept: The claims solve the problem of tedious text-only descriptions of image regions by enabling a user to provide a GUI-based contextual prompt — such as a click, loupe, marker, or segmentation annotation — that "indicates an area of emphasis in the image," whereupon the multimodal machine learning model is "configured to condition the textual response to the image on the contextual prompt," producing a targeted response without requiring the user to textually describe the image region.

Independent Claim Dissection

ClaimPreambleTransitionKey Body Elements
Claim 1A method of interacting with a pre-trained multimodal machine learning model, the methodcomprising
providing a GUI configured to enable user interaction with an image to generate a contextual prompt indicating area of emphasis; receiving the contextual prompt; generating input data using image and contextual prompt; generating a textual response by applying input data to multimodal ML model configured to condition textual response to image on contextual prompt; providing textual response to user; wherein textual response comprises a prompt suggestion and providing response comprises displaying a selectable control in GUISearch prior art ↗
Claim 13A system for interacting with a pre-trained multimodal machine learning model, the systemcomprising
at least one processor; at least one non-transitory computer readable medium containing instructions that cause system to: provide GUI configured to enable user interaction with image to generate contextual prompt indicating area of emphasis; receive contextual prompt; generate input data; generate textual response by applying input data to multimodal ML model configured to condition response on contextual prompt; provide textual response; wherein textual response comprises prompt suggestion and providing response comprises displaying selectable control programmed to enable user to select prompt suggestionSearch prior art ↗

Claim Dependency Tree

1 Method: GUI-based contextual prompt interaction with pre-trained multimodal ML model; response conditioned on image area of emphasisSearch Claim 1 prior art ↗
2 Adds: generating input data comprises generating updated image based on contextual prompt and generating input data using updated imageSearch in Eureka ↗
3 Adds: generating input data comprises generating segmentation mask by providing image and contextual prompt to segmentation model; input data generated using image and segmentation maskSearch in Eureka ↗
4 Adds: generating input data comprises generating a textual prompt or token using the contextual prompt; and generating input data using image and textual prompt or tokenSearch in Eureka ↗
5 Further: (dep. on 4) textual prompt or token indicates coordinates of a location in the imageSearch in Eureka ↗
6 Adds: receiving contextual prompt comprises detecting a user human interface device interactionSearch in Eureka ↗
7 Adds: GUI includes an annotation tool; contextual prompt comprises an annotation generated using annotation toolSearch in Eureka ↗
8 Further: (dep. on 7) annotation tool includes a loupe, a marker, or a segmentation toolSearch in Eureka ↗
9 Further: (dep. on 7) GUI enables user to resize an area of effect of annotation toolSearch in Eureka ↗
10 Adds: further includes receiving a textual prompt from user; input data further generated using textual prompt; ML model further conditions response on textual promptSearch in Eureka ↗
11 Adds: in response to selection of control by user, generating second input data using prompt suggestion and image; generating second response; providing second responseSearch in Eureka ↗
12 Adds: contextual prompt indicates object depicted in image; textual response provides information about depicted object; textual response displayed as virtual button in GUISearch in Eureka ↗
13 System: processor + non-transitory CRM; same operative steps as Claim 1 including GUI, contextual prompt, input data generation, ML model response conditioning, selectable prompt suggestion controlSearch Claim 13 prior art ↗
14 Adds: (dep. on 13) generating input data comprises generating updated image based on contextual promptSearch in Eureka ↗
15 Adds: (dep. on 13) generating input data comprises generating segmentation mask via segmentation modelSearch in Eureka ↗
16 Adds: (dep. on 13) generating input data comprises generating textual prompt or token with coordinates of a location in imageSearch in Eureka ↗
17 Adds: (dep. on 13) GUI includes annotation tool; contextual prompt comprises annotation generated using annotation toolSearch in Eureka ↗
18 Further: (dep. on 17) annotation tool includes loupe/marker/segmentation; GUI enables resize; further operations include receiving textual prompt and further conditioning responseSearch in Eureka ↗
19 Adds: (dep. on 13) in response to selection of control, generating second input data and second response using prompt suggestionSearch in Eureka ↗
20 Further: (dep. on 19) contextual prompt indicates object; textual response provides information about object; displayed as virtual buttonSearch in Eureka ↗
MetricThis ApplicationSoftware / AI Industry Norm
Total claims2015 – 25
Independent claim count22 – 4
Dependent : Independent ratio9.0 : 14 – 8 : 1
Method claims present?Yes — Claim 1Common
System / apparatus claims?Yes — Claim 13Common
Analysis powered by PatSnap Eureka. Patent text and figures publicly available from USPTO. Draft a Similar Patent
Drafting Quality

Drafting Quality Signals

The patent demonstrates strong spec–claim consistency through extensive GUI scenario walkthrough in the detailed description, and uses strategically broad "comprising" transitions throughout all independent and dependent claims. The most significant weakness is the absence of a computer-readable medium (CRM) claim, leaving an entire enforcement vector uncovered that competitors could exploit with software-only implementations.

Antecedent Basis
The antecedent basis is clean throughout all 20 claims. Claim 1 introduces "a graphical user interface," "the contextual prompt," "the image," and "the multimodal machine learning model" in proper sequence, and all subsequent references in Claims 2–12 correctly use definite articles referencing these introduced elements. Claim 13 independently re-introduces "a user," "an image," and "the contextual prompt" in the correct order, with dependent Claims 14–20 properly back-referencing. No orphaned "the" references were found across the claim set.
Spec–Claim Consistency
All major claim limitations map to specific figures and paragraphs. The GUI providing an area-of-emphasis contextual prompt (Claim 1, element 1) maps to FIGs. 3A–3F and the detailed description at col. 5–6. The segmentation mask limitation of Claim 3 maps to FIG. 3F and col. 7–8. The annotation tool resize of Claim 9 maps to FIGs. 3B–3C and col. 9. The prompt suggestion selectable control of Claims 1 and 11 maps directly to FIGs. 5A, 5B, 6A, 6B and col. 13–15. The tokenization process underlying Claims 4–5 is described at col. 7–8 but lacks a dedicated figure, which is the single notable consistency gap.
Transition Word Usage
All independent and dependent claims use "comprising" consistently, which is the correct open-ended transition for this technology domain, ensuring that additional system components (e.g., additional pre-processing modules or secondary models) do not break infringement. The use of "comprising" in the system Claim 13 preamble and in each operative step is particularly important given OpenAI's multimodal architecture, where implementations may include components beyond those explicitly claimed. No missed opportunity for "consisting essentially of" narrowing was identified, and no inadvertent limiting transitions were used.
§112(f) Means-Plus-Function Risk
No "means for" or "step for" language appears anywhere in the 20 claims, substantially eliminating §112(f) MPF exposure. The system Claim 13 uses structural recitation ("at least one processor" and "at least one non-transitory computer readable medium containing instructions") rather than functional "means" language, which is the appropriate drafting approach for software-implemented systems post-Williamson v. Citrix. The term "configured to" is used throughout Claim 13 and is well-established as structural language avoiding §112(f) invocation. No latent MPF risks were identified.
⚠️
§101 Eligibility Risk
The claims carry moderate §101 Alice/Mayo exposure because the core inventive concept — applying a multimodal ML model conditioned on a GUI contextual prompt — could be characterized by an examiner as an abstract idea of "processing information based on user input." The hardware tie-in in Claim 13 ("at least one processor" and "non-transitory computer readable medium") provides some §101 defense, and the GUI annotation tool limitations of Claims 7–9 and 17–18 add specificity. However, the method Claim 1 contains no explicit hardware recitation and relies entirely on functional steps, making it the most vulnerable claim in a §101 challenge — a stronger filing would have included a hardware-tied preamble or referenced the image processing system 120 architecture of FIG. 1.
Dependent Claim Fallback Quality
The dependent claims add genuinely distinct technical limitations: Claim 3 adds segmentation mask generation (a specific implementation pathway), Claim 4–5 add tokenization with coordinate indication (a data-processing mechanism), Claims 7–9 add the annotation tool hierarchy (loupe/marker/segmentation with resize capability), Claim 10 adds secondary textual prompt conditioning, and Claims 11–12 add the second-response-generation loop and virtual button display. These represent meaningfully different fall-back positions that would each require separate invalidation arguments. The primary weakness is that Claims 14–20 structurally mirror Claims 2–12 for the system claim, which adds prosecution breadth but does not introduce new technical concepts.
⚠️
Abstract Quality
The abstract accurately describes the method embodiment but omits the system claim and any mention of the prompt engineering mechanism that distinguishes the invention from prior art. The abstract states the model is "configured using prompt engineering to identify a location in the image conditioned on the image and the textual prompt" — this reasonably captures the key technical mechanism. However, the abstract does not mention the selectable control / prompt suggestion feature that is recited in both independent claims as a required limitation, which could lead a reviewer searching only the abstract to mischaracterize the scope and miss the interactive prompt suggestion loop as a core feature.
⚠️
Figure Support Quality
Figure support is strong for UI-level limitations: FIGs. 3A–3F cover all annotation tool types recited in Claims 7–9, FIGs. 5A–5B and 6A–6B cover the prompt suggestion selectable control of Claims 1 and 11–12, and FIG. 4 covers the second-response generation loop. However, the tokenization process underlying Claims 4–5 (generating textual prompts or tokens with coordinate indicators from contextual prompts) lacks a dedicated figure — only the general flow of FIG. 2 (step 230) provides indirect support. Additionally, the ML model conditioning mechanism described in Claims 1 and 13 maps only to the abstract FIG. 8 platform diagram rather than to a figure showing how the multimodal model architecture implements the conditioning on contextual prompt.
Analysis powered by PatSnap Eureka. Patent text and figures publicly available from USPTO. Draft a Similar Patent
Scorecard

Strategic Intent Scorecard

Multi-dimensional assessment of this application's patent strategy quality, based on claim structure, specification depth, and prosecution positioning.

Claim Breadth
3.5
Prosecution Defensibility
3.8
Spec–Claim Consistency
4.2
Dependent Claim Coverage
4
Claim Type Diversity
2.5
Figure Support Quality
3.8
Breadth Prosecution Consistency Dep. Coverage Claim Types Figures
Key observation: Spec–Claim Consistency scores highest (4.2/5) because the detailed description maps every operative UI state to a named figure — FIGs. 3A–3F, 5A–5B, and 6A–6B provide granular visual support for the annotation tool and prompt suggestion limitations of Claims 7–12 and 17–20. Claim Type Diversity scores lowest (2.5/5) because the patent covers only method and system claim types, entirely omitting the CRM/computer-program-product claim type that would capture pure software implementations and close the most obvious design-around pathway for competitors distributing multimodal AI applications. A practitioner reviewing this patent should prioritise filing a continuation with at least one CRM independent claim directed to the non-transitory medium storing instructions for the GUI contextual prompt interaction workflow.
See how your own draft compares — Open Eureka IP Drafting →
Critical Gaps

3 Critical Gaps in This Claim Set

A senior-attorney lens on the three highest-priority structural weaknesses — what each exposes in prosecution and litigation, and what a stronger filing would have done differently.

🔒

3 Critical Gaps in This Claim Set

See the full attorney-level analysis of what this application leaves unprotected — and how to draft it more defensively for your own filings.

No CRM claim for software distribution Prompt suggestion limits independent claims No video or real-time stream input claims
Unlock Full Analysis — Free
Frequently asked questions

US 12,039,431 B1 — key questions answered

Still have questions? PatSnap Eureka can answer them from patent data instantly. Search in Eureka
PatSnap Eureka

Ready to Draft Your Next Patent with AI?

PatSnap Eureka's AI drafting agent writes structured claims, flags coverage gaps, and positions your application for prosecution success.

Disclaimer: This analysis is generated by PatSnap Eureka AI based on publicly available patent data from the USPTO. It does not constitute legal advice and should not be relied upon as such. Patent data may be subject to change as prosecution progresses. Scores and assessments reflect automated analysis and may not capture all relevant legal or technical nuances. Always consult a qualified patent attorney for formal legal opinions on patentability, freedom to operate, or infringement.

Ask anything about this patent.
PatSnap Eureka searches patents and data to answer instantly.
Powered by PatSnap Eureka
Link copied to clipboard

Eureka built for innovation research

Eureka built for research
Domain-specific AI agents for IP, Engineering, Life Sciences, and Materials
Patents, Scientific Literature, Compounds & More Unified in One Platform
Ask, Research, Solve, Draft, and Validate Your Work from Weeks to Minutes
Try it for Free

Help us improve this page

Found incorrect or outdated information? Let us know and we'll get it fixed.