The dataset spans ChatGPT, Claude, Gemini, Perplexity, Mistral and DeepSeek, observed across five verticals (apparel, beauty, home, food, electronics). Stats are refreshed hourly from aicatalogscore.com; the full rubric and dataset README are open at github.com/commerce-agentic (CC0 + MIT).
How AI shopping agents actually read your catalog
Most merchants assume LLMs are searching their store the way Google does: crawling pages, indexing keywords, ranking by relevance. They're not. Modern shopping agents have two acquisition paths:
- Tool-use over structured product data.The agent calls a retrieval function that returns JSON shaped roughly like a Shopify product (title, description, variants, metafields). The agent never sees your HTML.
- RAG over scraped product pages.The agent's training set or live retrieval includes your product pages, chunked into embeddings. Here the agent reads what's on the page, but heavily favors structured blocks (lists, tables, headings) over prose.
Either way, the structure of your data matters more than the prose around it. A 600-word marketing description with no bullet points loses to a 200-word description with three bullets and a spec table. Every single time, in every single agent we measured.
The 8 signals, ranked by weight
Our rubric assigns 100 points across eight dimensions. The weighting reflects how much each signal moves AI agent behavior in the dataset, not how easy it is to fix.
| # | Signal | Weight | Most-common failure |
|---|---|---|---|
| 1 | Title quality | 15 | Missing product-type noun |
| 2 | Description | 20 | No bullets, no factual markers |
| 3 | Images & alt text | 15 | ≥50% of images have empty alt |
| 4 | Variant structure | 10 | No barcode, no semantic option names |
| 5 | Metafields | 15 | Zero AI-relevant metafields |
| 6 | Category & tags | 10 | Shopify Standard Taxonomy not set |
| 7 | SEO meta | 10 | seo.title identical to product title |
| 8 | Pricing & inventory | 5 | compareAtPrice missing |
The remaining −10 is a penalty applied to any product with zero AI-relevant metafields. We'll get to that at the end. It's the single most impactful lever in the rubric.
Title quality: get the product-type noun into the first 80 characters
Agents lean on the title to decide whether your product is a candidate at all. Six criteria, weighted by what we saw in the captures:
- Length 30-80 chars (4 pts): under 30 chars carries no information; over 80 gets truncated in most agent UIs.
- Word count ≥ 5 (2 pts): single-word titles like "Hoodie" are filtered out as ambiguous.
- Contains the product-type noun (3 pts): matches your
productTypefield, or a vertical-specific spec pattern. - Contains a distinctive attribute (3 pts): material, color, ingredient, or use case. The thing that disambiguates this product from a thousand others.
- Not ALL CAPS (2 pts): agents treat all-caps as low-quality marketing.
- No fluff superlatives (1 pt): "premium", "amazing", "best", these get downweighted as untrusted brand-self-claims.
The fix: rewrite each title to {Product Type} · {Material/Spec} · {Use-Case}. For example, Performance Hoodie → Men's Performance Hoodie · Recycled Polyester · Training. Full deep-dive with per-vertical before/after examples in the titles guide.
Description: 150+ words, structured, with factual markers
The highest-weighted signal, and the one most merchants under-invest in. Agents parse descriptions hierarchically: they scan for bullet lists and subheadings first, then for factual claims (numbers, units, ingredients), then for use-case language.
- Word count ≥ 150 (4 pts): gives the agent enough context to pick this product over competitors.
- At least one bulleted list (3 pts): LLMs parse
<ul><li>better than prose. - At least one
<h3>subheading (2 pts): semantic anchors help agents chunk content. - ≥ 3 factual markers (4 pts): units (mL, oz, cm), percentages, ingredients, materials, certifications.
- ≥ 2 use-case mentions (2 pts): "ideal for X", "wear during Y", "works with Z".
- ≥ 1 spec list with recognized labels (2 pts): "Material:", "Size:", "Care:", "Ingredients:". These get extracted as structured data.
- Zero fluff terms (3 pts): same penalty as titles. "Premium", "luxury", "high-quality" hurt more than they help.
The fix: open the description with a one-paragraph hook, then a bulleted spec list, then a use-case paragraph. Cut every adjective that doesn't carry information. Full deep-dive with per-vertical examples and the HTML template in the descriptions guide.
Images & alt text: the alt is what the agent actually reads
Image-based shopping agents (ChatGPT vision, Claude vision) don't reliably interpret your hero shots. They fall back to alt text. Stores that treat alt text as an accessibility checkbox leave a 7-point hole in every product.
- Image count ≥ 3 (8 pts): multiple angles let the agent verify the product matches the query.
- Alt text coverage ≥ 80% (7 pts): among image media, with
alt.length > 5to filter "image1.jpg" placeholders.
The fix: write alt text that describes the product, not the photo. Hoodie front view is useless. Men's recycled polyester training hoodie in black with kangaroo pocket tells the agent everything it needs.
Variant structure: barcodes are the AI version of GTINs
Variants are how agents understand "the same product in a different size or color". When you only have a Default variant, the agent treats your product as one of many indistinct hoodies; when you have proper variants with named options and SKUs, it can match queries like "blue hoodie XL" directly.
- Multiple variants, not just default (3 pts): even one extra variant flips this flag.
- Every variant has a SKU (3 pts): agents use SKU for unique-product identification.
- At least one variant has a barcode (2 pts): ISBN, GTIN, EAN. Strengthens cross-store matching.
- Variant options use semantic names (2 pts):
SizeandColorbeatOption1.
Metafields: the highest-leverage dimension in the entire rubric
If you fix one thing, fix this. AI agents lean on metafields more than any other signal because metafields are structured by design. They're already in the shape the agent wants. Our rubric is vertical-aware: a beauty product needs key_ingredient, a food product needs ingredients, an apparel product needs material.
- Google Product Category set (5 pts): maps your product to a standard taxonomy agents already know.
- "Material bucket" metafield set (3 pts): vertical-specific key (material / key_ingredient / fabric / composition).
- "Dimensions bucket" metafield set (3 pts): size / weight / volume / capacity.
- "Care bucket" metafield set (2 pts): care_instructions / storage / how_to_use.
- ≥ 3 custom metafields total (2 pts): namespace ≠ "global".
The fix: open the Shopify admin → Settings → Custom data → Products. For each vertical, define the 3-4 metafields agents care about for your category. Then bulk-set them. The single highest-ROI 30 minutes of work you can do for AI visibility. Which metafield keys? See our companion article: the Shopify metafields guide, vertical by vertical.
Category & tags: Shopify Standard Product Taxonomy is not optional
Shopify rolled out a Standard Product Taxonomy with thousands of pre-defined categories. Setting it is one click. Most stores don't, leaving 4 points on the table per product.
- Shopify Standard Product Taxonomy category assigned (4 pts):
product.category.idis set. - productType field set (2 pts): even a free-text fallback if you don't use Standard Taxonomy.
- ≥ 5 tags (2 pts): attribute tags, not collection names.
- Tags include vertical-relevant terms (2 pts): at least 2 of your tags must match the vertical's factual-marker patterns.
SEO meta: yes, agents still look at it
Google-style SEO is partially obsolete for AI agents, but not entirely. Many agents still scrape SERP-shaped metadata for fast retrieval. Two simple criteria:
seo.titleset and ≠product.title(5 pts): distinct, length 30-60 chars.seo.descriptionset, length 70-160 chars (5 pts).
The fix: if you currently leave SEO meta blank and let Shopify auto-fill from the product title, you're forfeiting 10 pts. Write distinct SEO copy that emphasizes the disambiguating attribute.
Pricing & inventory: small weight, easy wins
The lowest-weighted signal because most stores get it right by default, but worth checking for two specific failure modes:
price > 0on at least one variant (2 pts): yes, we see $0 products in the wild.compareAtPriceset on at least one variant (1 pt): sale/discount signal agents surface in price-sensitive queries.- Inventory tracking enabled (1 pt): out-of-stock filtering is a hard requirement for several agents.
- Inventory > 0 at audit time (1 pt): or oversell-allowed policy, which counts too.
The signal-gap penalty: the −10 hidden in the rubric
Any product that has zero AI-relevant metafields (no Google Product Category, no vertical material / dimensions / care bucket) takes a flat −10 penalty on its total score, regardless of how well it scores everywhere else.
Why? Because AI agents rely on structured data first. A perfect title and description with no metafields is still likely to be skipped. The agent has nothing reliable to index against. The penalty is design intent, not an observed effect size: we don't run merchant-controlled A/B tests on AI mention rates (no clean counterfactual is available without merchant cooperation). What we do observe is that the brands cited most often in the captures dataset overwhelmingly publish structured metafield data on the platforms where they're recommended.
Audit your store in 60 seconds
The rubric above is implemented in our free public audit page. Drop in any Shopify store URL and you get a per-product breakdown across the 8 signals, plus the top issues across the catalog. The full installed app (open source rubric, closed engine) writes the fixes back to your store with one click.
Methodology is open under CC0 at github.com/commerce-agentic/agentic-catalog-scanner. Dataset is open under MIT at commerce-agentic/ai-visibility-metrics. Fork, audit, contribute issues.
Score your catalog in 60 seconds
Free public audit of any Shopify store. No install required.
Run a free audit → Install the app on Shopify