Un-integrated merchants
Completing a checkout the agent has never seen — the Agnic Checkout Engine, Explore then Pay.
Un-integrated merchants
Visa and Mastercard have made the first part of agentic commerce real: an AI agent can now carry a scoped payment credential — an agentic token.
But that only solves half the problem.
A payment credential still has to be used somewhere. And most merchants do not have an "agent checkout."
There are millions of merchants on the open web. The majority are not going to rebuild their checkout just because AI agents can now pay. They did not rebuild for every previous shift in commerce infrastructure, and they will not do it here at the speed or scale agent commerce needs.
So the real problem is simple:
If an agent is going to buy from ordinary merchants, something has to complete the checkout those merchants already have.
The obvious approaches all break down.
A script for each merchant works until the merchant moves a field, renames a button, changes a validation rule, or reorders the checkout. Then it fails mid-order. Someone has to notice, understand what changed, and fix it. That might work for ten merchants. It does not work for the open web.
A scraper has a different problem. It can read parts of a page, but it does not really understand the checkout. It does not know whether the order actually completed. That is not good enough when money is moving.
A general-purpose browser agent is more flexible, but it brings a different kind of risk. It has to reason through the checkout live, every time. That makes it slower, more expensive, and less predictable. It may take a slightly different path on each run. It may keep going when it should stop. And it still needs a dependable way to know whether the order actually went through.
That is fine for a demo. It is not enough for payment.
What is needed is a checkout engine that can use the merchant's existing checkout as it is — without asking the merchant to integrate, without writing a fragile script for every store, and without asking a model to improvise through every order.
It needs to learn a checkout once, complete it repeatedly, stop when it is not safe to continue, and leave evidence of what happened.
That is the problem the Agnic Checkout Engine solves for un-integrated merchants.
A complete order, start to finish
Here is the Agnic Checkout Engine doing exactly that — completing an order at a café it has never seen, on the agent's behalf.

A customer tells their assistant:
"Order a sourdough loaf and an oat latte from Maple Grounds for pickup."
Maple Grounds is a neighbourhood café. It has never heard of Agnic — no integration, no API, no agent support. Just the checkout it already has.
First contact — Explore learns the café
Maple Grounds isn't in Agnic's network yet, so before anything is put in front of the customer, Explore runs: a model drives the café's live checkout the way a careful person would — read-only, stopping at the payment step, charging nothing — and measures what an order here actually takes.

One pass, and an unknown checkout has become structured facts — the real price, the details the café requires, the card form it uses — captured as a Recipe: the deterministic path to payment.
One confirmation
The agent shows the customer a single line. Every fact in it came from Explore moments earlier — none of it is guessed:
"Maple Grounds — sourdough × 1, oat latte × 1 — $12.45, pickup. Sharing your name and email. Pay with Visa ending 4242. Confirm?"
The order — Pay completes it
They say yes. Pay completes the order by replaying the Recipe Explore just recorded:
The card is the only sensitive thing in the flow, and it stays sealed: the real number never touches the agent, the model, or the page. It is substituted inside the PCI vault as the request leaves the browser. The customer gets a receipt; Agnic keeps a verifiable record of what was asked, what was approved, and what was shared.
| Party | What they see |
|---|---|
| The customer | One line to confirm, then a receipt. |
| The merchant | An ordinary checkout, completed. No integration, no agent-specific code. |
| Agnic | Reads the checkout, fills it under the mandate, supplies the vaulted card, records the evidence. |
The café did nothing. It saw a normal customer complete a normal checkout. And the learning is already done — the next order at Maple Grounds, anyone's, skips Explore entirely and goes straight to Pay, replaying the Recipe. Replace the café with any merchant on the open web: the order works the same way.
Inside the engine: Explore and Pay
The Agnic Checkout Engine splits the job in two. Explore learns a merchant once — slowly, carefully, with a reasoning model. Pay completes every order after by replaying what Explore learned — fast, deterministic, and self-healing. The model's job is to learn the checkout, not to run it. The difference from a general-purpose browser agent is architectural:
| A generic browser agent | The Agnic Checkout Engine | |
|---|---|---|
| The model's role | Drives every step of every order | Explore learns the checkout once; Pay heals it when it changes |
| Run-to-run behaviour | Non-deterministic | Deterministic replay of a recorded Recipe |
| When the page changes | Improvises, or fails | Heals the one affected step; the fix persists |
| Card data | Risks entering the model's context | Never in model context — alias only, swapped inside the vault |
| Success rate | Assumed from benchmarks | Measured per merchant, from real orders |
| Knowing it worked | Inferred from the page | Confirmed from an authoritative payment signal |
| When uncertain | Keeps going | Stops — step-up, handoff, or fail-safe with evidence |
Explore — learning a merchant
On a merchant Agnic has never seen, Explore learns the checkout: a reasoning model drives it end to end, once. The model runs inside a tight harness — it never sees a raw page and never names a selector; it may only choose from the interactive elements the engine has already found on the live DOM, and say what to do with one. Its output is grounded in the page, not guessed at. That harness is what turns a probabilistic model into a dependable, reusable result:
- Structured perception, not screenshots-and-hope. Explore reduces each page to its genuinely interactive elements — role, accessible name, the section and nearby text that give it meaning, current state, and any inline validation. It hit-tests each candidate, so only elements a real click would land on are considered. Decoy fields planted to trap naive automation (hidden, off-screen, or invisible to assistive technology) are recognised and excluded.
- The model proposes; the engine disposes. The model never emits a selector or a coordinate. It may only reference an element from that list and pick an action from a fixed, four-verb vocabulary — and the engine, not the model, decides whether that means a click, a typed value, or a selection, and refuses outright to touch the pay button. A field the model might imagine is simply not on the list, so it cannot be filled. This is how a probabilistic model produces a deterministic artifact.
- The model is paid once. The reasoning happens at learning time and is recorded. Every later order replays the result with no model in the loop — fast, cheap, identical run to run. The expensive step happens exactly once per merchant, not once per order.
- Act, then verify. After every action it confirms the page actually advanced — comparing the page's new state against the state that step was expected to produce — and checks the cart against the user's intent (right item, right quantity, right price) before going further. Nothing is assumed to have worked.
- Durable signatures, not selectors. Each step is recorded as a semantic signature of what the element is and means — not a CSS path or a pixel position — so the Recipe survives redesigns.
- Value provenance on every step. Each recorded value is tagged by where it came from: the goal (the product and quantity), a preference (a pickup time — re-decided fresh on every order), or the customer's profile (stored as a claim, never a value).
- A manifest, not just a path. Explore also measures what the merchant requires — the price, the currency, the payment processor, and the exact identity claims its checkout asks for. That manifest is what the agent shows the user before they confirm.
- It never charges. Explore stops at payment-ready. Learning a checkout and spending money are never the same operation.
A Recipe is that merchant's checkout as a deterministic path. An illustrative step:
{
"action": "type",
"target": { "kind": "field", "name": "Email", "section": "Contact" },
"valueSource": "profile",
"claim": "email",
"value": null
}A profile step stores the claim, never the value — the Recipe is PII-free at rest, and each buyer's own email is injected at fill time, from their own mandate. Explore runs once per merchant; after that, an order skips straight to Pay.
Pay — completing the order
On a learned merchant, Pay replays the Recipe with no model in the hot path — and the same discipline:
- Deterministic re-find. Each step re-locates its element by scoring the live page against the recorded signature. No model latency, the same behaviour every run.
- Every step is gated. A step isn't done when the click lands — it's done when the page moves to the state the Recipe expects. Pay always knows whether it is still on-script.
- Two kinds of drift, one response. An element that can't be found, or an action that has no effect, triggers a heal: one scoped model call re-locates that step, and the fix is written back to the Recipe. The Recipe improves every time the merchant changes something — drift is absorbed, not accumulated.
- Per-order decisions stay live. Preferences like a pickup slot are re-decided on every order; yesterday's 2:00 pm is not an answer today.
- PII is injected, never replayed. Profile steps carry a claim; the buyer's own disclosed value is filled at order time, from their mandate.
- The card surface is handled, not guessed. Hosted card fields live in isolated frames with defences of their own. Pay uses a hardened procedure per processor family and verifies every field by reading it back after the fill — a card form is either filled exactly, or the order stops.
Knowing when to stop
On the open web, the most important property of an autonomous engine is knowing when it must not continue. Every stop is structured, evidenced, and safe:
| Situation | What the engine does |
|---|---|
| The order falls outside the mandate | Stops before Pay. The user approves that specific order with their passkey, or it doesn't happen. |
| The merchant needs a claim the user hasn't shared | Stops and asks for that claim. It never guesses, and never reuses another buyer's data. |
| A captcha or 3-D Secure appears | Hands the user a live view; the same session resumes after the check. |
| The merchant requires a sign-in | Hands the user a live view to sign in themselves; the session resumes once they're in. Agnic never sees the password. |
| Drift it cannot heal | Fails safe with the full step-by-step evidence — and no charge. |
It never improvises around a safeguard.
How a merchant earns autonomy
Those same outcomes govern how much autonomy a merchant earns — trust is measured, not asserted.
No merchant starts trusted. A merchant Explore has just learned is a candidate: it must prove itself on real orders before the engine will run it unattended. Merchants that prove out graduate to auto; checkouts that genuinely need a person stay under human supervision. Promotion is earned from measured outcomes, not asserted.
The agent reads the merchant's standing before it commits, in every Explore response:
| Field | Meaning |
|---|---|
recipe_known | Whether the engine has a learned Recipe for this merchant. false is a cold first run, which explores live. |
status | auto, candidate, or always_handoff — the merchant's earned standing. |
pass_k | The merchant's measured success rate across recent real orders. |
runs | How many real orders inform the figure. |
So the cardholder's expectation is set by data, not optimism — and anyone reviewing the system can ask, per merchant, for the exact numbers the agent saw.
The order protocol
The engine's two capabilities are the two calls your agent makes — Explore (preview_order_via_autofill) and Pay (place_order_via_autofill) — on the Agnic MCP server and REST API. Confirm-then-execute: nothing is charged without an explicit, price-bound confirmation.
Explore is read-only. It resolves the merchant, validates every item, recomputes the authoritative total (the merchant decides the price, never the agent), summarises the identity claims that will be shared, reports the merchant's standing, and issues a one-use confirmation token bound to the merchant, items, amount, and currency. If an item, price, or card is wrong, it returns structured blockers — with suggestions — instead of a token. An illustrative response, for the bakery order:
{
"success": true,
"ready_to_place": true,
"merchant_name": "Maple Grounds",
"canonical_items": [
{ "sku": "sourdough-loaf", "name": "Sourdough Loaf", "quantity": 1, "unit_price_cents": 650 },
{ "sku": "oat-latte", "name": "Oat Latte", "quantity": 1, "unit_price_cents": 595 }
],
"expected_amount_minor": 1245,
"currency": "USD",
"card_preview": { "brand": "visa", "last_four": "4242" },
"reliability": { "recipe_known": true, "status": "auto", "runs": 41, "pass_k": 0.97 },
"pii_disclosure_summary": [
{ "claim": "given_name", "value_redacted": "Alex", "will_share": true },
{ "claim": "email", "value_redacted": "al…@…om", "will_share": true }
],
"preview_summary_for_user": "Maple Grounds — sourdough × 1, oat latte × 1 — $12.45, pickup. Sharing your name and email. Pay with Visa ending 4242. Confirm?",
"confirmation_token": "cft_…",
"expires_at": 1781280000000
}preview_summary_for_user is the exact sentence the agent shows the customer — the one from the walkthrough above. The merchant's standing (reliability) and the disclosure summary travel in the same response, so the decision to proceed is made with everything on the table.
Pay executes against the token plus the user's verbatim confirmation, completes the checkout with the vaulted card, and returns a terminal status with its evidence:
{
"success": true,
"order_id": "af_ord_…",
"status": "succeeded",
"merchant_id": "merchant_maple_grounds",
"evidence": {
"screenshots_count": 9,
"screenshots_url": "https://app.agnic.ai/orders/af_ord_…"
}
}An order moves through a fixed set of states — and only forward. A paid order can never be paid twice:
| Status | Meaning |
|---|---|
succeeded | The merchant accepted the charge and the order completed. |
merchant_error | The merchant received the card and declined the charge. |
worker_error | The checkout could not be completed — for example, the page changed mid-order. |
approval_required | A step-up is needed: a passkey approval or a live human check. |
processing | Still in flight. The caller polls get_order_status; it never re-dispatches. |
Two guarantees ride on every order:
- Price binding. A Pay call charges exactly the amount its token was issued for. Changing the cart or the amount after Explore invalidates the token.
- No double charge. A slow order returns
processingwith an order id; callers poll for the outcome and never re-dispatch. The order is idempotent at the dispatch boundary.
Consent and selective disclosure
Every order is governed by the user's standing mandate — their delegation to the agent — which declares the limits the agent may spend within and the identity attributes it may share.
- Before the user confirms, the Explore response shows the exact claims that will be shared — for example email, name, and postal address — each drawn from their profile and shown in plain terms.
- The agent shares only what the merchant requires for that order, never the whole profile.
- Anything outside the mandate triggers an explicit step-up: the user approves that specific order with their passkey, and only then does it proceed.
The Recipe itself carries no customer data — it records the claim a field needs, as in the example above. Each buyer's own value is injected at order time, from their mandate.
Live view and human handoff
Most of the open web lets an order complete unattended. A few checkouts ask for something only the shopper can give — a captcha, a 3-D Secure step-up, a one-time code, or a sign-in to the merchant. The engine is built to detect these and bring in a person, not to work around them.
Two modes share one mechanism — a live, frame-by-frame stream of the in-flight session:
- Watch, read-only. Any order can be watched live: the user, or a reviewer, sees the same session the engine is driving, as it happens. Useful for the first orders at a new merchant, and for oversight.
- Handoff, interactive. When a human-only step is detected — a check, or a merchant sign-in — the engine pauses with the browser session held open and issues a secure, single-use, time-limited link. The user opens it, sees the live page, and acts; their input is relayed back into that same session. For a sign-in they enter their own credentials and tap "I've signed in — continue" — Agnic never sees the password. The moment the step clears, the engine resumes the recorded path and finishes the order.
The session is never re-dispatched and the card is never re-entered — the order continues from exactly where it paused. The link is bound to one session, expires on a short timer, and is void the instant the order resolves. The engine runs no captcha-solving service and stores no merchant passwords: a challenge — or a sign-in — is a reason to involve a person, by design.
Observability and evidence
Every run — a learning pass and every order — emits a structured, step-by-step trace, so no part of a checkout is a black box.
- A per-step trace. For each step the engine records what it matched, where the value came from — the goal, a freshly decided preference, or a profile claim — how the page responded, and the timing. Profile values are redacted to their claim: the trace names what was shared, never the customer's data.
- A terminal status, with proof. Every order ends in one explicit state backed by evidence — a screenshot sequence of the checkout and, when enabled, a recording of the session — reachable from the order record.
- Success is judged, not assumed. Whether an order actually completed is decided from an authoritative payment signal — the processor's confirmation and the amount that settled — never from a screenshot or the model's say-so. A page that looks finished is not a success until the charge is confirmed.
- Per-merchant reliability, from real outcomes. The engine keeps running counts per merchant — orders, successes, heal events, and the success rate they imply — and exposes them. The
reliabilityblock an agent reads before every order is a live view of those numbers, not a static badge.
For a reviewer, any order reconstructs end to end: the instruction, the confirmation, the exact amount, the claims disclosed, the step-by-step path, and the signal that says it settled.
Security and data handling
The defining property of the design is what never enters it: the card. The agent and the model operate on a non-sensitive alias from start to finish; the real number exists only inside the PCI vault boundary, where it is substituted into the payment request in flight.
Who sees what, exactly:
| Data | Agent + model | Agnic orchestration | PCI vault | Merchant |
|---|---|---|---|---|
| Real card number | Never | Never | Holds it; substitutes it in flight | Receives a normal charge through its own processor |
| Card alias — non-sensitive | Yes | Yes | Maps it to the card | — |
| Identity claims | Mandate-granted only | Order-required only | — | Only what its checkout asks for |
| Order — items, amount, currency | Yes | Yes | — | Yes |
| Mandate — limits and grants | Operates within it | Enforces it | — | Never |
And the controls behind the table:
| Control | How it works |
|---|---|
| Card data | Tokenized and held in a PCI-DSS-scoped vault. The agent, the model, and Agnic's orchestration layer see only the alias; the real number never appears in agent runtime, logs, or prompt context. |
| Identity | Shared by selective disclosure — only the claims the merchant requires for the order, authorised by the user's mandate. |
| Authorisation | Two-step confirm-then-execute with a one-use, price-bound token; out-of-mandate requests require an explicit per-order passkey approval. |
| Audit | Every order persists the user's original instruction, their confirmation, the exact items and amount approved, the claims disclosed, and step-by-step evidence — including screenshots and, when enabled, a session recording. |
Supported checkouts
The engine targets standards-based card entry: hosted card fields and embedded payment elements as used by major processors, including Stripe-based and Shopify-based checkouts, on desktop web. Coverage expands per merchant as Recipes are learned and verified — the merchant-standing fields above tell the agent, per merchant, what to expect today.