Why structured data beats prose for AI shopping agents
When ChatGPT, Claude, or Perplexity answer a buyer query, they have two ways to read your catalog:
- Structured retrieval.The agent calls a function that returns product objects with named fields (
title,price,material,ingredients, etc.). It reads the fields it knows about. - Unstructured retrieval.The agent reads the product description as prose, runs it through embeddings, retrieves what looks relevant.
Structured retrieval wins whenever it's available. The agent doesn't have to interpret your marketing copy; it just reads the field. material: "merino wool" tells the agent something definitive. The same fact written as "made with our buttery-soft signature wool blend" requires interpretation and is lossy.
This is why metafields are weighted so heavily in the rubric: they're the part of your catalog that's already in the shape agents prefer.
The three universal metafield slots
Across every vertical, the audit rewards three "slot" metafields plus a universal taxonomy field. The slot names vary per vertical; the function is the same.
| Slot | Function | Why agents care |
|---|---|---|
| Google Product Category | Maps your product to Google's standard taxonomy (e.g. Apparel & Accessories > Clothing > Activewear). | Most agents have this taxonomy memorized. Matching it cleanly slots your product into category-level queries. |
| Material bucket | What the product is made of or contains. Fabric for apparel, key ingredients for beauty, allergens for food, build material for electronics. | The factual marker AI agents most aggressively check. Buyer queries like "merino wool sweater" or "retinol serum" lean on this field. |
| Dimensions bucket | Size, weight, volume, capacity, or whatever quantitative attribute defines the product's physical instance. | Disambiguation. Two products with the same name and different sizes are different products. Agents can't recommend correctly without this. |
| Care / usage bucket | How to use, store, apply, or maintain the product. Activity for fitness gear, skin type for beauty, age range for baby, storage for food. | Targeting. Lets agents match products to specific buyer contexts ("for sensitive skin", "for newborns", "for trail running"). |
For each vertical below, we name the canonical key, list the variants our open rubric accepts, and show a concrete value example. Pick one canonical key per slot per vertical and apply it consistently across your catalog. The rubric source for every regex below is commerce-agentic/agentic-catalog-scanner.
The 10 verticals
Apparel: fabric, sizing, care
The most-cited apparel queries on AI agents are material-led ("merino wool sweater", "100% cotton t-shirt", "recycled polyester jacket"). Get the material field set on every product or you're invisible to those queries.
custom.material · Accepted variants: composition, fabric, fibers, contentcustom.size_guide · Accepted: dimensions, chest, inseam, waist, sizingcustom.care_instructions · Accepted: wash, laundry, cleaningBeauty: key ingredient, skin type, volume
Beauty queries are aggressively ingredient-led ("retinol serum for sensitive skin", "niacinamide moisturizer", "vitamin C with SPF"). The dominant slot is key_ingredient. Agents won't surface your product for an ingredient query if it isn't named in a structured field. Description prose is not enough.
custom.key_ingredient · Accepted: ingredient, active, formulation, fragrance, extractcustom.volume · Accepted: size, capacity, ml, oz, net_weightcustom.skin_type · Accepted: application, routine, usage, stepHome: wood type, dimensions, room
Furniture queries are dimensional ("dining table for 6", "queen bed frame", "shelf for small spaces"). Material drives the second wave ("solid oak", "marble coffee table", "rattan armchair"). Both must be set as structured fields, not buried in copy.
custom.material · Accepted: wood_type, finish, upholstery, fabric, compositioncustom.dimensions · Accepted: height, width, depth, seating_capacity, weightcustom.room · Accepted: care, maintenance, assembly, styleElectronics: chassis, specs, compatibility
Electronics queries are spec-led ("USB-C laptop", "noise-cancelling headphones with 30h battery", "Matter-compatible smart bulb"). Compatibility is the highest-leverage slot here. Agents match products to ecosystems (iOS, HomeKit, Matter, etc.) directly from the field.
custom.build · Accepted: material, housing, chassis, finishcustom.screen_size · Accepted: display_size, battery_capacity, storage, weightcustom.compatibility · Accepted: os_version, model_number, connectivity, warrantyFitness: material, weight, activity
Fitness gear is queried by activity ("yoga mat", "cycling shoes", "running shorts") then by attribute ("moisture-wicking", "non-slip", "ergonomic"). The activity field is your matching key into category queries.
custom.material · Accepted: composition, fabric, contentcustom.weight · Accepted: size, capacity, resistance, gendercustom.activity · Accepted: sport, skill_level, usageFood: ingredients, serving size, storage
Food queries are diet-led ("gluten-free granola", "high-protein vegan", "single-origin coffee under $20"). Ingredients and dietary attributes must be in a structured field. Agents won't infer "vegan" from a product photo.
custom.ingredients · Accepted: ingredient, allergen, dietary, compositioncustom.serving_size · Accepted: net_weight, volume, servingcustom.storage · Accepted: shelf_life, prep, origin, brewingPets: species, breed size, life stage
Pet queries are tightly scoped ("food for senior small-breed dogs", "cat toy for kittens", "orthopedic dog bed for large breeds"). Species and life-stage are the two most disambiguating slots. Without them your product matches no specific query.
custom.material · Accepted: composition, ingredient, contentcustom.breed_size · Accepted: size, weight, capacitycustom.species · Accepted: life_stage, activity, trainingBaby: material safety, age range, certifications
Baby product buyers query safety-first ("BPA-free pacifier", "GOTS-certified organic crib sheet", "newborn to 12 months"). Safety certifications and age range are the must-have slots. Buyers won't trust a recommendation without these as structured fields.
custom.safety_cert · Accepted: material, composition, fabric, certificationcustom.age_range · Accepted: age_group, size, weight, seat_weightcustom.milestone · Accepted: care, wash, ageOutdoor: material, capacity, weather rating
Outdoor gear queries are condition-led ("3-season tent", "waterproof hiking boots", "ultralight backpack under 1 kg"). Weather and skill-level slots are how agents narrow from broad activity queries to specific products.
custom.material · Accepted: composition, fabric, fill, insulationcustom.capacity · Accepted: weight, volume, load, temperature_ratingcustom.weather_rating · Accepted: activity, terrain, skill_level, seasonGifts: recipient, occasion, presentation
Gift queries are intent-driven ("birthday gift under $50 for coffee lovers", "anniversary gift for him", "Mother's Day gift sets"). Recipient and occasion slots are essentially gift-specific. None of the other verticals have them, but gifts cannot be discovered without them.
custom.material · Accepted: composition, content, gift_packaging, presentationcustom.price_tier · Accepted: size, weight, capacitycustom.recipient · Accepted: occasion, gift_message, personalization, engravingHow to set these in Shopify (the 5-minute version)
If you've never touched metafields in Shopify, the path is:
- Shopify admin → Settings → Custom data → Products.
- Add definition. Pick a namespace (
customworks for most stores). Pick a key (use the canonical one above for your vertical). Pick a type (text for ingredients/material, dimension for sizes, etc.). - Save. The field now appears in the Metafields section of every product edit page.
- Fill it on every product. Use bulk edit or a CSV import for catalogs > 20 SKUs.
- Also set Google Product Category on each product (Products → bulk edit → Category). This is the standard taxonomy field, independent of your custom metafields but equally important.
Time investment: 30 minutes for the definitions + 1-3 hours of bulk-fill for a 100-SKU catalog. The hardest part is psychological: picking your three canonical slot names and committing to them.
What this article doesn't claim
To be explicit about what we don't have:
- We don't have controlled A/B data showing "products with metafield X get cited Y% more". A real test would require merchant cooperation, holding all else equal, observing AI capture rates before and after, none of which is feasible at our scale yet.
- We do have the rubric, which is open source and weighted based on what AI agents demonstrably parse from product feeds.
- We do have the captures dataset, which lets us observe that AI agents heavily cite brands whose catalogs use structured fields. But this is a correlation, not a causal claim.
Treat the canonical keys above as a tested checklist, not a yield prediction.
Audit your metafields in 60 seconds
Free audit covers the public catalog signals. Install the Shopify app to audit (and bulk-set) your metafields too.
Run a free audit → Install on Shopify