Webflow Multi-Image Alt Text at Scale

Or: what to do when your CMS has 272 missing alt tags and filenames won't cut it.

The Problem Most People Don't See Until It's Too Late

You ship a Webflow site with a portfolio, product catalog, or case study collection. Each item has a multi-image field (five, seven, ten images per entry). A year later, an AEO audit flags thousands of missing alt attributes.

You open the Webflow CMS API, try to PATCH alt text on the multi-image field, and nothing works.

Here's the thing nobody mentions: alt text in a Webflow multi-image field doesn't live on the field itself. It lives per-image-object, inside the array.

// Single image field — alt lives on the field
"main-image": {
  "fileId": "abc123",
  "url": "...",
  "alt": "description here"
}

// Multi-image field — alt lives PER OBJECT inside the array
"set-of-images": [
  { "fileId": "x", "url": "...", "alt": "..." },
  { "fileId": "y", "url": "...", "alt": "..." },
  { "fileId": "z", "url": "...", "alt": "..." }
]

If you PATCH the field naively (replacing the whole array with new alt values but not preserving fileId and url), you either orphan the image reference or overwrite with broken data. Most batch-update scripts online get this wrong. It's gotcha one. There are more.

Why This Matters for AEO, Not Just Accessibility

Alt text has always been an accessibility concern. That's not news. What is news: AI engines parse alt text for image context when answering queries.

ChatGPT, Perplexity, and Google's AI Overviews use alt text to understand what an image depicts, what page it belongs on, and what topic it reinforces. Image search inside these engines is growing. Alt text is a ranking and citation signal for AEO now, not just a screen reader feature.

Filename-based alt text is worse than useless. Google's quality guidelines treat duplicate or thin alt text as a spam signal. "dydx-screenshot-1.jpg" turned into "dYdX screenshot 1" across seven images on the same item is duplicate-content at image scale. You lose the citation, and you might lose the page.

You have two real options:

Write every alt by hand. Thorough. Expensive. Practically impossible at portfolio scale.
Use Claude Vision to actually describe each image. Cheap. Fast. Accurate, if you do it right.

This post is about option two, at scale, without tripping on the five walls we hit while doing it on our own portfolio.

The Pattern That Works

The end-to-end approach we landed on after several wrong turns:

Pull the real CMS data. Don't estimate image counts. Query the collection, get actual counts per item, export as a task file.
Normalize images before any vision work. Download locally. Convert any AVIF served with .jpg extensions into real JPEG. Resize everything to max 2000px on the longest edge.
Dry-run one item end-to-end. Download, view, generate alt, PATCH, publish. Catch format, auth, and schema issues on one slice before scaling.
Run the batch as isolated API calls, not inline vision inside a chat session. Each image is its own request. Resume-safe. Rate-limited. Cost-capped.
Review before pushing. Generate a verification table. Spot-check the outliers. Then push via the Webflow MCP or API.

Each step exists because we skipped it and it bit us.

Gotcha 1: The 2000px Dimension Cap on Multi-Image Requests

When Claude processes multiple images in a single request, each image must be under 2000px on its longest edge. A single 3200px screenshot in the batch kills the whole turn with a cryptic "dimension limit" error.

Portfolio screenshots exported from Webflow are often 2x or 3x retina (3000 to 4000px wide by default).

Fix: resize everything once before any batch vision work. On macOS:

cd /path/to/images
for f in *.jpg; do sips -Z 2000 "$f" >/dev/null 2>&1; done

sips -Z only shrinks, never upscales. Operates in place, so back up first.

Gotcha 2: Webflow Serves AVIF as .jpg

Webflow's image optimization sometimes serves AVIF-encoded images with .jpg extensions. Claude Vision doesn't accept AVIF. You get a format error on what looks like a valid JPEG and spend an hour chasing it.

Fix: check magic bytes, not the extension, before you pass the file to vision:

file image.jpg

If it reports AVIF, convert to real JPEG first. sips, ImageMagick, and ffmpeg all do the job.

Gotcha 3: Session Context vs. Isolated API Calls

This is where most "Claude at scale" tutorials fall apart.

If you try to view 200+ images inside a single Claude Code session, each image eats 1.5 to 2.5k tokens of conversation context. At 263 images, that's 400 to 600k tokens in one session. You hit the context cliff long before you finish, and output quality degrades as the window fills.

The fix is architectural, not tactical. Stop trying to view all images inline. Write a small Python script that calls the Anthropic Messages API directly, one image per request. Each call is isolated. Zero context bloat. Resume-safe if it crashes mid-run.

Key properties the script needs:

Incremental save after every successful call (never lose work on crash)
Progress file that lets re-runs skip completed images
Exponential backoff on 429 / 500 / 502 / 503 / 504
Rate-limited to 0.5s/call (well under the default 60 req/min cap)
Output validation: strip quotes, enforce character limits, reject empty responses
--dry-run flag with a cost estimate before any API spend

The reference implementation we use in-house is around 380 lines (single file, no dependencies beyond the Python standard library and certifi). Call the Messages API with x-api-key, pass the image as a base64 block, enforce the system prompt, parse the text response, write to a local JSON file, move on.

One auth note that costs people hours: OAuth tokens from Claude Pro or Claude Max do not authenticate the Messages API. You need a pay-as-you-go sk-ant-... API key from console.anthropic.com. Separate auth, separate billing line.

Gotcha 4: The Prompt Is Half the Work

Output quality is capped by the prompt. We iterated several times. What works:

You are generating accurate HTML alt text for portfolio images.
Each image is a screenshot of a client website.

Rules for your output:
- Describe what's ACTUALLY VISIBLE (brand name, headline, UI elements, colors, layout style)
- Include the client name naturally for entity reinforcement
- UNDER 125 characters
- NO "image of..." or "screenshot of..." prefix
- Be specific: "pricing page with three-tier comparison" beats "product screenshot"
- Do NOT guess from the filename — describe only what you see
- Output ONLY the alt text. No quotes, no preamble.

Two things matter most:

"Describe what's ACTUALLY VISIBLE." Without this, the model pattern-matches on the client name and writes a generic description that could apply to any screenshot. You need the model to commit to the specific image.
Character limit. 125 is the practical upper bound for alt text that screen readers read cleanly. Enforce it in the prompt and validate in the script. Truncate if the model exceeds.

Pass the client name and a one-line description in the user message for context, but keep the system prompt leaning on visual description over name-dropping.

Gotcha 5: The Webflow Multi-Image PATCH Pattern

Back to the schema issue. To update a multi-image field, PATCH the whole array. Preserve fileId and url for every image. Update alt only:

patched_images = [
    {
        "fileId": img["fileId"],
        "url": img["url"],
        "alt": alt_lookup[img["fileId"]],
    }
    for img in item["fieldData"]["set-of-images"]
]

payload = {
    "fieldData": {
        "set-of-images": patched_images,
        "main-image": {
            **item["fieldData"]["main-image"],
            "alt": alt_lookup[item["fieldData"]["main-image"]["fileId"]],
        },
        "thumbnail": {
            **item["fieldData"]["thumbnail"],
            "alt": alt_lookup[item["fieldData"]["thumbnail"]["fileId"]],
        },
    }
}

client.update_collection_item(
    collection_id=COLLECTION_ID,
    item_id=item["id"],
    payload=payload,
)

Then publish the collection. Alts go live.

Skip the fileId / url preservation and you orphan references or overwrite with null. Do this wrong on production and you'll have broken images across the site. Always test on staging first.

What We Actually Ran

Collection: Karpi Studio portfolio
Items in scope: 49 (one with zero images, one with missing main-image and thumbnail fields. Both flagged as data-entry issues, not script failures.)
Total images processed: 263
Alt tags written: 272
API cost (Sonnet 4.5): ~$1.58
Runtime: ~15 minutes
Failure rate: 0%
Images needing manual cleanup after review: 0. All 272 alt texts landed under the 125-character limit.

Sample rows from the verification table:

Slug	Field	Alt text	Chars
dydx	main-image	dYdX homepage hero with DeFi's Pro Trading Platform headline and 3D hedgehog mascot	83
dydx	thumbnail	dYdX homepage thumbnail showing the DeFi trading platform hero	62
blueberry-pediatrics	main-image	Blueberry Pediatrics homepage with 'Say hello to Blueberry' hero, mother-child photo, award badges, and How it works section	124
neocharge	main-image	NeoCharge homepage with Take The Power Of Home Charging headline and Tesla Model S driveway	91
casella-family-brands	main-image	Casella Family Brands site with Yellow Tail Shiraz bottle hero and 20th birthday blog mobile mockup	99
ynvisible-interactive	main-image	Ynvisible Interactive homepage showing Ultra Low Power Segmented Displays hero with product photos and customer logos	117

Lessons Nobody Tells You

Estimate based on actual data, not guesses. We initially estimated 30 items x 9 images = 270. Reality: 49 items, 272 images, distribution ranging from 0 to 10 per item. Pulling the real CMS data takes 30 seconds and saves an hour of wrong-direction work.
Do one item end-to-end before scaling. The 2000px cap, the AVIF-as-JPEG issue, and the multi-image schema pattern would all have surfaced on the first image if we'd committed to a single dry-run slice. Instead we caught them one at a time across multiple sessions, at real cost.
Isolate the API calls. Don't run batch vision inline in a chat session. The context cliff is non-obvious until you hit it. Script it.
Review before pushing. Always. Even with a good prompt, a small share of outputs need human cleanup. The verification table is your gate.
OAuth tokens are not API keys. Claude Max and Claude Pro use OAuth for session auth. The Messages API needs a pay-as-you-go sk-ant-... key. Separate auth flow, separate billing line.

The Short Version

If you're reading this because you searched for "how to bulk update alt text in Webflow multi-image fields", here's the compressed answer:

Export the CMS collection. Get real image counts per item.
Download images locally. Resize to max 2000px. Convert AVIF to JPEG.
Write a Python script that calls the Anthropic Messages API one image at a time, with a vision-specific prompt, resume-safe progress, and rate limiting.
Review the verification table.
PATCH each item via the Webflow API. Preserve fileId and url, update alt per-image-object in the array. Publish.

Total cost for 272 images on Sonnet 4.5: under $2. Total time: ~15 minutes. Total alt tags that now genuinely describe the image instead of duplicating a filename: 272.

One Note on AEO

If you're doing this to improve AEO citations, alt text is one input. The next layer is schema markup (ImageObject schema that ties each image to its subject, URL, and description). That's what we're building out in Schema HQ: a Webflow-specific system for structured image and content markup.

Alt text + schema + real content strategy = AI engines that cite your pages as authoritative. Fake alt text + duplicate meta descriptions + thin content = invisible.

We run AEO as a monthly retainer for founders who want to be cited by ChatGPT, Perplexity, and AI Overviews instead of hoping for it. If that's your problem, let's talk.

How to Update Alt Text in Webflow Multi-Image Fields with Claude (at Scale)