Un-integrated merchants

Visa and Mastercard have made the first part of agentic commerce real: an AI agent can now carry a scoped payment credential — an agentic token.

But that only solves half the problem.

A payment credential still has to be used somewhere. And most merchants do not have an "agent checkout."

There are millions of merchants on the open web. The majority are not going to rebuild their checkout just because AI agents can now pay. They did not rebuild for every previous shift in commerce infrastructure, and they will not do it here at the speed or scale agent commerce needs.

So the real problem is simple:

If an agent is going to buy from ordinary merchants, something has to complete the checkout those merchants already have.

The obvious approaches all break down.

A script for each merchant works until the merchant moves a field, renames a button, changes a validation rule, or reorders the checkout. Then it fails mid-order. Someone has to notice, understand what changed, and fix it. That might work for ten merchants. It does not work for the open web.

A scraper has a different problem. It can read parts of a page, but it does not really understand the checkout. It does not know whether the order actually completed. That is not good enough when money is moving.

A general-purpose browser agent is more flexible, but it brings a different kind of risk. It has to reason through the checkout live, every time. That makes it slower, more expensive, and less predictable. It may take a slightly different path on each run. It may keep going when it should stop. And it still needs a dependable way to know whether the order actually went through.

That is fine for a demo. It is not enough for payment.

What is needed is a checkout engine that can use the merchant's existing checkout as it is — without asking the merchant to integrate, without writing a fragile script for every store, and without asking a model to improvise through every order.

It needs to learn a checkout once, complete it repeatedly, stop when it is not safe to continue, and leave evidence of what happened.

That is the problem the Agnic Checkout Engine solves for un-integrated merchants.

A complete order, start to finish

Here is the Agnic Checkout Engine doing exactly that — completing an order at a café it has never seen, on the agent's behalf.

How Agnic completes a checkout it has never seen — explore once, pay reliably after. Five steps: 1) Customer asks — order a sourdough loaf and an oat latte from Maple Grounds. 2) Explore learns the checkout — a reasoning model explores the live checkout read-only, stops before payment with no charge, and records a reusable Recipe of the merchant's requirements. 3) One confirmation — the agent shows one clear summary: items, total $12.45, pickup, the identity claims that will be shared, and the card preview, then asks to confirm. 4) Pay replays the Recipe — after approval, Pay fills the checkout deterministically and completes the order, self-healing small page changes. 5) Receipt and evidence — the customer gets a receipt and Agnic stores verifiable evidence of what was asked, approved, and shared. No merchant integration needed; the vaulted card is never exposed to the agent or model; the next order skips Explore and goes straight to Pay.

A customer tells their assistant:

"Order a sourdough loaf and an oat latte from Maple Grounds for pickup."

Maple Grounds is a neighbourhood café. It has never heard of Agnic — no integration, no API, no agent support. Just the checkout it already has.

First contact — Explore learns the café

Maple Grounds isn't in Agnic's network yet, so before anything is put in front of the customer, Explore runs: a model drives the café's live checkout the way a careful person would — read-only, stopping at the payment step, charging nothing — and measures what an order here actually takes.

Explore — learning a checkout once. An unknown merchant (Maple Grounds) goes through a read-only Explore pass that learns the checkout — authoritative price $12.45, required details (name, email, pickup time), fulfilment (pickup), and the payment form (card number, expiry, CVC, ZIP) — then records a Recipe and manifest as a deterministic path to payment.

One pass, and an unknown checkout has become structured facts — the real price, the details the café requires, the card form it uses — captured as a Recipe: the deterministic path to payment.

One confirmation

The agent shows the customer a single line. Every fact in it came from Explore moments earlier — none of it is guessed:

"Maple Grounds — sourdough × 1, oat latte × 1 — $12.45, pickup. Sharing your name and email. Pay with Visa ending 4242. Confirm?"

The order — Pay completes it

They say yes. Pay completes the order by replaying the Recipe Explore just recorded:

The card is the only sensitive thing in the flow, and it stays sealed: the real number never touches the agent, the model, or the page. It is substituted inside the PCI vault as the request leaves the browser. The customer gets a receipt; Agnic keeps a verifiable record of what was asked, what was approved, and what was shared.

Party	What they see
The customer	One line to confirm, then a receipt.
The merchant	An ordinary checkout, completed. No integration, no agent-specific code.
Agnic	Reads the checkout, fills it under the mandate, supplies the vaulted card, records the evidence.

The café did nothing. It saw a normal customer complete a normal checkout. And the learning is already done — the next order at Maple Grounds, anyone's, skips Explore entirely and goes straight to Pay, replaying the Recipe. Replace the café with any merchant on the open web: the order works the same way.

Inside the engine: Explore and Pay

The Agnic Checkout Engine splits the job in two. Explore learns a merchant once — slowly, carefully, with a reasoning model. Pay completes every order after by replaying what Explore learned — fast, deterministic, and self-healing. The model's job is to learn the checkout, not to run it. The difference from a general-purpose browser agent is architectural:

	A generic browser agent	The Agnic Checkout Engine
The model's role	Drives every step of every order	Explore learns the checkout once; Pay heals it when it changes
Run-to-run behaviour	Non-deterministic	Deterministic replay of a recorded Recipe
When the page changes	Improvises, or fails	Heals the one affected step; the fix persists
Card data	Risks entering the model's context	Never in model context — alias only, swapped inside the vault
Success rate	Assumed from benchmarks	Measured per merchant, from real orders
Knowing it worked	Inferred from the page	Confirmed from an authoritative payment signal
When uncertain	Keeps going	Stops — step-up, handoff, or fail-safe with evidence

Explore — learning a merchant

On a merchant Agnic has never seen, Explore learns the checkout: a reasoning model drives it end to end, once. The model runs inside a tight harness — it never sees a raw page and never names a selector; it may only choose from the interactive elements the engine has already found on the live DOM, and say what to do with one. Its output is grounded in the page, not guessed at. That harness is what turns a probabilistic model into a dependable, reusable result:

Structured perception, not screenshots-and-hope. Explore reduces each page to its genuinely interactive elements — role, accessible name, the section and nearby text that give it meaning, current state, and any inline validation. It hit-tests each candidate, so only elements a real click would land on are considered. Decoy fields planted to trap naive automation (hidden, off-screen, or invisible to assistive technology) are recognised and excluded.
The model proposes; the engine disposes. The model never emits a selector or a coordinate. It may only reference an element from that list and pick an action from a fixed, four-verb vocabulary — and the engine, not the model, decides whether that means a click, a typed value, or a selection, and refuses outright to touch the pay button. A field the model might imagine is simply not on the list, so it cannot be filled. This is how a probabilistic model produces a deterministic artifact.
The model is paid once. The reasoning happens at learning time and is recorded. Every later order replays the result with no model in the loop — fast, cheap, identical run to run. The expensive step happens exactly once per merchant, not once per order.
Act, then verify. After every action it confirms the page actually advanced — comparing the page's new state against the state that step was expected to produce — and checks the cart against the user's intent (right item, right quantity, right price) before going further. Nothing is assumed to have worked.
Durable signatures, not selectors. Each step is recorded as a semantic signature of what the element is and means — not a CSS path or a pixel position — so the Recipe survives redesigns.
Value provenance on every step. Each recorded value is tagged by where it came from: the goal (the product and quantity), a preference (a pickup time — re-decided fresh on every order), or the customer's profile (stored as a claim, never a value).
A manifest, not just a path. Explore also measures what the merchant requires — the price, the currency, the payment processor, and the exact identity claims its checkout asks for. That manifest is what the agent shows the user before they confirm.
It never charges. Explore stops at payment-ready. Learning a checkout and spending money are never the same operation.

A Recipe is that merchant's checkout as a deterministic path. An illustrative step:

{
  "action": "type",
  "target": { "kind": "field", "name": "Email", "section": "Contact" },
  "valueSource": "profile",
  "claim": "email",
  "value": null
}

A profile step stores the claim, never the value — the Recipe is PII-free at rest, and each buyer's own email is injected at fill time, from their own mandate. Explore runs once per merchant; after that, an order skips straight to Pay.

Pay — completing the order

On a learned merchant, Pay replays the Recipe with no model in the hot path — and the same discipline:

Deterministic re-find. Each step re-locates its element by scoring the live page against the recorded signature. No model latency, the same behaviour every run.
Every step is gated. A step isn't done when the click lands — it's done when the page moves to the state the Recipe expects. Pay always knows whether it is still on-script.
Two kinds of drift, one response. An element that can't be found, or an action that has no effect, triggers a heal: one scoped model call re-locates that step, and the fix is written back to the Recipe. The Recipe improves every time the merchant changes something — drift is absorbed, not accumulated.
Per-order decisions stay live. Preferences like a pickup slot are re-decided on every order; yesterday's 2:00 pm is not an answer today.
PII is injected, never replayed. Profile steps carry a claim; the buyer's own disclosed value is filled at order time, from their mandate.
The card surface is handled, not guessed. Hosted card fields live in isolated frames with defences of their own. Pay uses a hardened procedure per processor family and verifies every field by reading it back after the fill — a card form is either filled exactly, or the order stops.

Knowing when to stop

On the open web, the most important property of an autonomous engine is knowing when it must not continue. Every stop is structured, evidenced, and safe:

Situation	What the engine does
The order falls outside the mandate	Stops before Pay. The user approves that specific order with their passkey, or it doesn't happen.
The merchant needs a claim the user hasn't shared	Stops and asks for that claim. It never guesses, and never reuses another buyer's data.
A captcha or 3-D Secure appears	Hands the user a live view; the same session resumes after the check.
The merchant requires a sign-in	Hands the user a live view to sign in themselves; the session resumes once they're in. Agnic never sees the password.
Drift it cannot heal	Fails safe with the full step-by-step evidence — and no charge.

It never improvises around a safeguard.

How a merchant earns autonomy

Those same outcomes govern how much autonomy a merchant earns — trust is measured, not asserted.

No merchant starts trusted. A merchant Explore has just learned is a candidate: it must prove itself on real orders before the engine will run it unattended. Merchants that prove out graduate to auto; checkouts that genuinely need a person stay under human supervision. Promotion is earned from measured outcomes, not asserted.

The agent reads the merchant's standing before it commits, in every Explore response:

Field	Meaning
`recipe_known`	Whether the engine has a learned Recipe for this merchant. `false` is a cold first run, which explores live.
`status`	`auto`, `candidate`, or `always_handoff` — the merchant's earned standing.
`pass_k`	The merchant's measured success rate across recent real orders.
`runs`	How many real orders inform the figure.

So the cardholder's expectation is set by data, not optimism — and anyone reviewing the system can ask, per merchant, for the exact numbers the agent saw.

The order protocol

The engine's two capabilities are the two calls your agent makes — Explore (preview_order) and Pay (place_order) — on the Agnic MCP server and REST API. Confirm-then-execute: nothing is charged without an explicit, price-bound confirmation.

Explore is read-only. It resolves the merchant, validates every item, recomputes the authoritative total (the merchant decides the price, never the agent), summarises the identity claims that will be shared, reports the merchant's standing, and issues a one-use confirmation token bound to the merchant, items, amount, and currency. If an item, price, or card is wrong, it returns structured blockers — with suggestions — instead of a token. An illustrative response, for the bakery order:

{
  "success": true,
  "ready_to_place": true,
  "merchant_name": "Maple Grounds",
  "canonical_items": [
    { "sku": "sourdough-loaf", "name": "Sourdough Loaf", "quantity": 1, "unit_price_cents": 650 },
    { "sku": "oat-latte", "name": "Oat Latte", "quantity": 1, "unit_price_cents": 595 }
  ],
  "expected_amount_minor": 1245,
  "currency": "USD",
  "card_preview": { "brand": "visa", "last_four": "4242" },
  "reliability": { "recipe_known": true, "status": "auto", "runs": 41, "pass_k": 0.97 },
  "pii_disclosure_summary": [
    { "claim": "given_name", "value_redacted": "Alex", "will_share": true },
    { "claim": "email", "value_redacted": "al…@…om", "will_share": true }
  ],
  "preview_summary_for_user": "Maple Grounds — sourdough × 1, oat latte × 1 — $12.45, pickup. Sharing your name and email. Pay with Visa ending 4242. Confirm?",
  "confirmation_token": "cft_…",
  "expires_at": 1781280000000
}

preview_summary_for_user is the exact sentence the agent shows the customer — the one from the walkthrough above. The merchant's standing (reliability) and the disclosure summary travel in the same response, so the decision to proceed is made with everything on the table.

Pay executes against the token plus the user's verbatim confirmation, completes the checkout with the vaulted card, and returns a terminal status with its evidence:

{
  "success": true,
  "order_id": "af_ord_…",
  "status": "succeeded",
  "merchant_id": "merchant_maple_grounds",
  "evidence": {
    "screenshots_count": 9,
    "screenshots_url": "https://app.agnic.ai/orders/af_ord_…"
  }
}

An order moves through a fixed set of states — and only forward. A paid order can never be paid twice:

Status	Meaning
`succeeded`	The merchant accepted the charge and the order completed.
`merchant_error`	The merchant received the card and declined the charge.
`worker_error`	The checkout could not be completed — for example, the page changed mid-order.
`approval_required`	A step-up is needed: a passkey approval or a live human check.
`processing`	Still in flight. The caller polls `get_order_status`; it never re-dispatches.

Two guarantees ride on every order:

Price binding. A Pay call charges exactly the amount its token was issued for. Changing the cart or the amount after Explore invalidates the token.
No double charge. A slow order returns processing with an order id; callers poll for the outcome and never re-dispatch. The order is idempotent at the dispatch boundary.

Every order is governed by the user's standing mandate — their delegation to the agent — which declares the limits the agent may spend within and the identity attributes it may share.

Before the user confirms, the Explore response shows the exact claims that will be shared — for example email, name, and postal address — each drawn from their profile and shown in plain terms.
The agent shares only what the merchant requires for that order, never the whole profile.
Anything outside the mandate triggers an explicit step-up: the user approves that specific order with their passkey, and only then does it proceed.

The Recipe itself carries no customer data — it records the claim a field needs, as in the example above. Each buyer's own value is injected at order time, from their mandate.

Live view and human handoff

Most of the open web lets an order complete unattended. A few checkouts ask for something only the shopper can give — a captcha, a 3-D Secure step-up, a one-time code, or a sign-in to the merchant. The engine is built to detect these and bring in a person, not to work around them.

Two modes share one mechanism — a live, frame-by-frame stream of the in-flight session:

Watch, read-only. Any order can be watched live: the user, or a reviewer, sees the same session the engine is driving, as it happens. Useful for the first orders at a new merchant, and for oversight.
Handoff, interactive. When a human-only step is detected — a check, or a merchant sign-in — the engine pauses with the browser session held open and issues a secure, single-use, time-limited link. The user opens it, sees the live page, and acts; their input is relayed back into that same session. For a sign-in they enter their own credentials and tap "I've signed in — continue" — Agnic never sees the password. The moment the step clears, the engine resumes the recorded path and finishes the order.

The session is never re-dispatched and the card is never re-entered — the order continues from exactly where it paused. The link is bound to one session, expires on a short timer, and is void the instant the order resolves. The engine runs no captcha-solving service and stores no merchant passwords: a challenge — or a sign-in — is a reason to involve a person, by design.

Observability and evidence

Every run — a learning pass and every order — emits a structured, step-by-step trace, so no part of a checkout is a black box.

A per-step trace. For each step the engine records what it matched, where the value came from — the goal, a freshly decided preference, or a profile claim — how the page responded, and the timing. Profile values are redacted to their claim: the trace names what was shared, never the customer's data.
A terminal status, with proof. Every order ends in one explicit state backed by evidence — a screenshot sequence of the checkout and, when enabled, a recording of the session — reachable from the order record.
Success is judged, not assumed. Whether an order actually completed is decided from an authoritative payment signal — the processor's confirmation and the amount that settled — never from a screenshot or the model's say-so. A page that looks finished is not a success until the charge is confirmed.
Per-merchant reliability, from real outcomes. The engine keeps running counts per merchant — orders, successes, heal events, and the success rate they imply — and exposes them. The reliability block an agent reads before every order is a live view of those numbers, not a static badge.

For a reviewer, any order reconstructs end to end: the instruction, the confirmation, the exact amount, the claims disclosed, the step-by-step path, and the signal that says it settled.

Security and data handling

The defining property of the design is what never enters it: the card. The agent and the model operate on a non-sensitive alias from start to finish; the real number exists only inside the PCI vault boundary, where it is substituted into the payment request in flight.

Who sees what, exactly:

Data	Agent + model	Agnic orchestration	PCI vault	Merchant
Real card number	Never	Never	Holds it; substitutes it in flight	Receives a normal charge through its own processor
Card alias — non-sensitive	Yes	Yes	Maps it to the card	—
Identity claims	Mandate-granted only	Order-required only	—	Only what its checkout asks for
Order — items, amount, currency	Yes	Yes	—	Yes
Mandate — limits and grants	Operates within it	Enforces it	—	Never

And the controls behind the table:

Control	How it works
Card data	Tokenized and held in a PCI-DSS-scoped vault. The agent, the model, and Agnic's orchestration layer see only the alias; the real number never appears in agent runtime, logs, or prompt context.
Identity	Shared by selective disclosure — only the claims the merchant requires for the order, authorised by the user's mandate.
Authorisation	Two-step confirm-then-execute with a one-use, price-bound token; out-of-mandate requests require an explicit per-order passkey approval.
Audit	Every order persists the user's original instruction, their confirmation, the exact items and amount approved, the claims disclosed, and step-by-step evidence — including screenshots and, when enabled, a session recording.

Supported checkouts

The engine targets standards-based card entry: hosted card fields and embedded payment elements as used by major processors, including Stripe-based and Shopify-based checkouts, on desktop web. Coverage expands per merchant as Recipes are learned and verified — the merchant-standing fields above tell the agent, per merchant, what to expect today.

Integrated merchants

For merchants connected to Agnic — the structured, server-to-server rail.

MCP Server

Connect an agent and call the checkout operations.

Un-integrated merchants

Integrated merchants

MCP Server

On this page