QUARTERLY REPORT · Q2 2026 · 12 min read · Published 20 May 2026

State of AI Commerce on Shopify, Q2 2026

The first open quarterly report on AI shopping agent behavior across Shopify catalogs. 1,047,024 ground-truth captures across 6 agents and 17,863 distinct brands. What ChatGPT, Claude, Perplexity, Gemini, Mistral and DeepSeek recommend, and what they don't.

1,047,024captures observed
6agents tracked
17,863brands tracked
22days of data
Open dataset. Methodology under CC0 at commerce-agentic/agentic-catalog-scanner. Raw captures under MIT at commerce-agentic/ai-visibility-metrics. All numbers in this report are reproducible from the dataset.

1. The AI shopping channel is real, and it's growing fast

Over the last 22 days we captured 1,047,024 distinct product recommendations from six AI agents, running our standardized buyer-intent query set across five verticals (apparel, beauty, home, food, electronics). That's an average of 47,592 captures per day, growing.

Each capture is a record of what an agent returned when our benchmark suite issued a buyer-style query. It is not a sample of real shopper traffic (which is private to each agent). Two takeaways from the benchmark:

  • The answers are deterministic enough to measure.Issue the same buyer-style query 30 days apart and you get largely overlapping product lists, meaning catalog-level signal is what moves the answers, not query phrasing noise.
  • The answers are concentrated.We'll show below that the top 10 brands receive a disproportionate share of mentions, which is bad news for the long tail and great news for whoever's optimizing.

2. Agent share: who recommends how much

Not all AI agents capture share equally. Across the 1,047,024 captures in this window:

Gemini
383,553
ChatGPT
176,927
Claude
167,890
DeepSeek
160,210
Mistral
158,444

Gemini leads with 383,553 captures (37% share). At the other end, Mistral trails at 158,444 (15%). The gap matters because different agents prioritize different signals. What wins on ChatGPT may underperform on Claude, and vice versa.

3. Top 10 brands AI agents recommend most

From the 200 distinct brands we observed in the last 90 days, the top 10 received the following mention counts:

#BrandMentionsAgents
1 Amazon
amazon.com
64,426 Claude, DeepSeek, Gemini, Mistral, ChatGPT
2 Patagonia
patagonia.com
26,846 Claude, DeepSeek, Gemini, Mistral, ChatGPT
3 Target
target.com
25,664 Claude, DeepSeek, Gemini, Mistral, ChatGPT
4 Uniqlo
uniqlo.com
21,267 Claude, DeepSeek, Gemini, Mistral, ChatGPT
5 Nike
nike.com
20,871 Claude, DeepSeek, Gemini, Mistral, ChatGPT
6 Walmart
walmart.com
17,097 Claude, DeepSeek, Gemini, Mistral, ChatGPT
7 Adidas
adidas.com
16,264 Claude, DeepSeek, Gemini, Mistral, ChatGPT
8 Columbia
columbia.com
16,108 Claude, DeepSeek, Gemini, Mistral, ChatGPT
9 Thenorthface
thenorthface.com
14,446 Claude, DeepSeek, Gemini, Mistral, ChatGPT
10 Rei
rei.com
14,435 Claude, DeepSeek, Gemini, Mistral, ChatGPT

The complete top 100 leaderboard is live and updates hourly. A few observations from the long tail (positions 50-200, omitted from the table above to keep it readable):

  • Mention counts drop off sharply after the top 20. Position 50 typically gets <10% of the mentions that position 1 does. Long-tail visibility is a real opportunity for catalogs that optimize properly.
  • Mid-tier brands (positions 30-100) are mostly cited by 2-3 agents, not all 6. Cross-agent visibility is rare and high-signal.

4. Catalog quality vs. mention rank

We ran the public AI Catalog Score audit on the top 200 most-mentioned brands. 49 stores returned valid catalog data. Average score: 59/100.

The top 10 by audit score:

#BrandAI Catalog ScoreProducts
1 Burtsbeesbaby 76/100 250
2 Mattandnat 75/100 250
3 Rothys 73/100 250
4 Gymshark 70/100 250
5 Decathlon 69/100 250
6 Colehaan 68/100 250
7 Outdoorresearch 67/100 250
8 Packagefreeshop 67/100 116
9 Boody 66/100 250
10 Toms 66/100 250

The full audit-score leaderboard is at /leaderboard/catalog-score. Worth noting: the catalogs with the highest mention counts are not always the same as the catalogs with the highest audit scores. Discoverability and catalog quality are correlated but not identical.

5. Top queries in our benchmark suite

The 10 most-frequent queries our standardized benchmark suite issued in this window (queries are pre-defined, not sourced from real shopper search logs):

#QueryCaptures
1 compare Allbirds Tree Runners vs Nike Pegasus 2,502
2 sustainable merino wool sweater for sensitive skin 2,448
3 best winter jacket under $200 for beginners 2,448
4 formal tie gift for grandma 2,424
5 sustainable bamboo fiber socks for small spaces 2,412
6 eco-friendly vegan leather boots 2,286
7 compare Sony WH-1000XM5 vs Bose QuietComfort Ultra 2,286
8 compare Kindle Paperwhite vs Kobo Clara 2,286
9 compare Anker vs Belkin USB-C charger 2,286
10 compact foldable treadmill for small apartment 2,286

The pattern in our suite: specific queries elicit more confident answers than broad ones. Queries like "waterproof running jacket under $200" and "vegan skincare with niacinamide" return concrete brand-and-product lists; broad queries like "running gear" return generic category guidance. We constructed our suite to test the constraint-rich end of the distribution intentionally. That's where AI agent retrieval is most discriminating, and where catalog quality differences surface most clearly. If your catalog can't answer factual constraints, you don't get cited.

6. The structural takeaway

Three qualitative patterns hold across the dataset, regardless of which agent or vertical we slice. None of these are causal claims; we don't run controlled merchant experiments. They're descriptions of what the captures look like.

  1. Structure beats prose. Brands cited most often in the captures dataset overwhelmingly publish structured metafield data on the platforms where they're recommended. The reverse is not observed: catalogs that hide attributes in marketing prose rarely surface at the top.
  2. Specificity correlates with citation. Top-ranked captures consistently surface products described with factual markers (units, ingredients, materials, certifications) rather than marketing superlatives. We haven't run a controlled comparison, but the pattern is visible at a glance.
  3. The distribution is winner-take-most. Rank 50 in our brand list receives ~5% of the mentions that rank 1 does. The long tail past rank 100 drops further still.
If you read one paragraph of this report: the single highest-leverage thing you can do for AI catalog visibility is set vertical-relevant metafields. The gap between "no AI-relevant metafields" and "3 vertical-relevant metafields" is the largest single jump in the rubric. We documented this in detail in the 8 signals article.

Methodology

Each day we run a ~5,000 query batch through six AI agents. The batch combines two sources: a 700-query anchor set of hand-curated queries kept identical across runs (so the same query's response can be tracked over time), and a probabilistically generated set that fills the rest.

The probabilistic generator samples each query from explicit distributions:

  • Length: 30% short (1-3 tokens, e.g. "running shoes"), 45% medium (4-8 tokens, e.g. "running shoes for marathon training"), 20% long (9-15 tokens, includes 2+ constraints), 5% verbose (16+ tokens, conversational).
  • Phrasing register: 55% search-style, 30% question-style ("what's the best..."), 15% conversational ("I'm looking for...").
  • Constraint mix: price ceiling, use case, demographic, factual attribute, brand relation. Pareto-distributed count, with at most one constraint per type per query.
  • Vertical share: Pareto across ten verticals (apparel and electronics ~18-20% each, beauty 17%, home 12%, gifts 10%, then a long tail through fitness, outdoor, pets, food, baby). Seasonally boosted (gifts in Q4, fitness in T1, outdoor in summer).

These parameters are explicit and reviewable in the open methodology repo. They are best-effort approximations of shopper-LLM behavior, calibrated from public observation rather than fitted to real shopper traffic (which is not publicly available). They will be wrong in some verticals. The right response when a reader pushes back is to debate the parameters, not to defend the output.

After collection, we extract product-and-brand recommendations from each agent's response. The parser is intentionally tolerant: different agents return slightly different shapes. We dedupe at the merchant-domain level per capture, then aggregate. Top brands are ranked over a 90-day window, which matches typical AI agent retraining cadence. Aggregated counts are exposed via /api/public/insights and refreshed hourly.

What the dataset is and is not. This is a benchmark. We do not observe real shopper traffic; actual shopping interactions with the agents are private to each provider. The signal the dataset surfaces is "given a buyer-style query, which catalogs do AI agents cite?", useful for benchmarking visibility and tracking changes over time. The signal it does not surface is "what queries real shoppers type to AI agents and at what volume" since that data exists only inside each agent's servers.

Limitations:

  • The query suite is generated from a model of shopper-LLM behavior, not sampled from real search logs. The model's parameters are best-effort approximations and may diverge from actual shopper distributions, especially in long-tail verticals.
  • Each query in the daily batch is run once per agent. Head queries in the real world receive many more shopper impressions than tail queries; our captures dataset treats them with equal weight.
  • Capture set is currently English-language only. Multi-language is on the roadmap.
  • "Mentions" do not equal "purchases". We measure AI agent visibility, not downstream conversion.
  • Catalog audits are over public products.json data; signals like metafields and SEO meta are install-only (covered in the full rubric).

Methodology open at commerce-agentic/agentic-catalog-scanner. Raw dataset README at commerce-agentic/ai-visibility-metrics.

Audit your catalog in 60 seconds

Free public scan of any Shopify store. See where you'd rank.

Run a free audit Install on Shopify