How to Bulk-Publish Content to Webflow CMS With Python and Claude Code

Webflow's CMS wasn't built for 300-article bulk uploads. CSV imports permanently reserve slugs on soft-delete, making re-imports dangerous. The Webflow MCP works for small batches but fails in background Claude Code agents because interactive permission prompts can't reach a headless process. Manual editing in the Designer doesn't scale past a few dozen items.

We published 281 articles to Webflow CMS with 796 API writes and zero failures. This is the playbook: the Python patterns that actually work at scale, the Webflow gotchas that cost us iterations, and how Claude Code fits into the pipeline.

What this guide is (and isn't)

Is: A concrete, runnable approach for pushing content from any source (markdown, database, spreadsheet) to Webflow CMS via the Data API. Includes the exact gotchas we hit and how to avoid them.

Isn't: A Webflow API reference. Read Webflow's docs for the official surface. This guide is about what the docs don't tell you.

Audience: Developers or technical marketers who want to bulk-publish beyond what the MCP, CSV import, or Designer can handle. You should be comfortable reading Python and have a Webflow site with API access.

Why Python + Data API instead of the MCP

The Webflow MCP is the right tool for conversational, human-in-the-loop content editing through Claude Code. For 20-item pushes, use it.

For bulk operations (50+ items, batched updates, fix passes across a corpus, scheduled publishing), use Python with the Data API directly. Reasons:

  • Background/headless execution. Claude Code agents running in the background cannot receive interactive permission prompts. The MCP's permission model breaks. A Python script has no such constraint.
  • Rate-limit control. You want deterministic 0.5s between calls, not whatever pacing the MCP chooses. Webflow's 150 req/min cap is easy to hit otherwise.
  • Error handling. When call 87 of 281 fails, you want to catch the exception, log it, skip to item 88, and keep a progress file for resume. Agent-driven execution won't do this reliably.
  • Cost. No LLM tokens spent on each push. The logic is deterministic; an agent designing the push is valuable, an agent executing it is waste.

Agents are for reasoning. Scripts are for execution. When the task is "do the same thing 281 times with different data," write a script.

Setup: auth, collection IDs, field slugs

You need three things before writing any push code.

1. API token

Generate at Site Settings → Apps & Integrations → API Access → Generate API token. Give it "CMS read and write" scope. Save it somewhere safe; Webflow will never show it again.

2. Collection IDs

Every CMS collection has a unique ID. Find it in the URL of the Collections page: https://webflow.com/design/your-site?tab=site-content&collection=69d78318b11f74482c3ac35d. The collection= value is the ID.

Or, programmatically:

import urllib.request, ssl, certifi, json
API_TOKEN = "YOUR_TOKEN_HERE"
SITE_ID = "YOUR_SITE_ID"
ctx = ssl.create_default_context(cafile=certifi.where())
req = urllib.request.Request(
   f"https://api.webflow.com/v2/sites/{SITE_ID}/collections",
   headers={"Authorization": f"Bearer {API_TOKEN}", "Accept": "application/json"},
)
with urllib.request.urlopen(req, context=ctx) as resp:
   for coll in json.loads(resp.read())["collections"]:
       print(f"{coll['id']}  {coll['displayName']}  (slug: {coll['slug']})")

3. Field slugs (the not-obvious step)

Every field in a Webflow collection has a slug that differs from its display name. "Meta Description" becomes meta-description. "Body" might be body or body-2 depending on when the field was created. This trips up everyone and silently wastes hours when you push to the wrong field.

Get the exact slugs by fetching one collection item and inspecting its fieldData:

req = urllib.request.Request(
   f"https://api.webflow.com/v2/collections/{COLLECTION_ID}/items?limit=1",
   headers={"Authorization": f"Bearer {API_TOKEN}", "Accept": "application/json"},
)
with urllib.request.urlopen(req, context=ctx) as resp:
   item = json.loads(resp.read())["items"][0]
   print(list(item["fieldData"].keys()))

Document these slugs somewhere permanent. In our project, CLAUDE.md has:

Types collection body field: body-2
Terms collection body field: body

This mismatch cost us a round trip when a QA agent queried body-2 on the Terms collection, got 0 content, and falsely flagged 100 articles as broken.

The push pattern

A minimal push script. This is production code that has pushed over 1,000 items with zero failures.

import sqlite3, json, time, urllib.request, ssl, os, certifi

API_TOKEN = "YOUR_TOKEN_HERE"
COLLECTION_ID = "YOUR_COLLECTION_ID"
BODY_FIELD = "body"   # or "body-2", whatever your collection uses
DB_PATH = "/absolute/path/to/content.db"
PROGRESS_FILE = "/tmp/my_push_progress.txt"

# macOS Python cannot find system certs — use certifi
ctx = ssl.create_default_context(cafile=certifi.where())

# Resume support
done = set()
if os.path.exists(PROGRESS_FILE):
   with open(PROGRESS_FILE) as f:
       done = set(l.strip() for l in f if l.strip())

conn = sqlite3.connect(DB_PATH, timeout=30)
c = conn.cursor()
c.execute("SELECT slug, webflow_id, body_html, meta_title, meta_description FROM my_items ORDER BY slug")
rows = [r for r in c.fetchall() if r[0] not in done]
print(f"{len(rows)} items to push ({len(done)} already done)")

pushed, failed = 0, []
for slug, wf_id, html, mt, md in rows:
   if not wf_id or not html or len(html) < 100:
       print(f"  SKIP {slug}: missing webflow_id or empty body")
       continue
   try:
       payload = {"fieldData": {
           BODY_FIELD: html,
           "meta-title": mt or "",
           "meta-description": md or "",
       }}
       data = json.dumps(payload).encode("utf-8")
       req = urllib.request.Request(
           f"https://api.webflow.com/v2/collections/{COLLECTION_ID}/items/{wf_id}",
           data=data,
           headers={
               "Authorization": f"Bearer {API_TOKEN}",
               "Content-Type": "application/json",
               "Accept": "application/json",
           },
           method="PATCH",
       )
       with urllib.request.urlopen(req, context=ctx) as resp:
           json.loads(resp.read().decode("utf-8"))
       pushed += 1
       with open(PROGRESS_FILE, "a") as f:
           f.write(slug + "\n")
       if pushed % 25 == 0 or pushed == len(rows):
           print(f"  {pushed}/{len(rows)}: {slug}")
       time.sleep(0.5)  # respect Webflow's 150 req/min cap
   except Exception as e:
       print(f"  FAIL {slug}: {e}")
       failed.append(slug)

conn.close()
print(f"\nPushed {pushed}/{len(rows)}, failed {len(failed)}")

Key design choices:

  • Absolute DB path. Background shells reset cwd; relative paths fail mysteriously.
  • certifi.where() for SSL context. macOS Python has no system cert bundle. Every urllib call without this will fail with CERTIFICATE_VERIFY_FAILED on mac. No warning, just a cryptic traceback.
  • Progress file. If the script dies at item 150, restart and it skips to 151. Do not build this with "retry the whole batch" logic. Resume on a line-delimited file is cheaper and more robust.
  • 0.5s delay. Webflow allows 150 req/min. At 0.5s per request you'll run at 120 req/min, with safety margin for burst requests from other clients on the same token.
  • Skip conditions. If html is shorter than some sanity threshold, skip rather than push. You'll be glad you did when a regex bug produces empty bodies and you only realize after 200 items.

The gotchas we paid to learn

1. Webflow RichText silently strips unsupported tags

The RichText field only accepts a specific subset of HTML. Anything else is dropped without warning:

  • <table> and children — stripped entirely
  • <script> — stripped
  • <iframe> — stripped
  • Custom embeds — must be added via the Designer API, not the Data API

Convert tables to bullet lists before generating HTML. For inline scripts or embeds, add them in the page template (not the CMS item).

2. Whitespace between <ul>/<li> tags breaks lists

This one took a full day to find. Webflow's RichText parser silently drops list children when whitespace separates <ul>, <li>, and </ul> tags.

The Python markdown library outputs <ul>\n<li>...</li>\n</ul> by default. Webflow accepts the payload, returns 200, and the Data API GET echoes the HTML back — so every check looks clean. But the editor's internal block representation has no bullet nodes, and the rendered live page shows empty sections.

Fix: compact HTML before pushing. Strip whitespace between block tags while preserving content inside <pre> code blocks (so JSON examples keep formatting):

import re

def compact(html):
   out, i = [], 0
   while i < len(html):
       pre = re.search(r'<pre[^>]*>[\s\S]*?</pre>', html[i:])
       if pre:
           out.append(re.sub(r'>\s+<', '><', html[i:i + pre.start()]))
           out.append(pre.group(0))
           i += pre.end()
       else:
           out.append(re.sub(r'>\s+<', '><', html[i:]))
           break
   return ''.join(out)

Run this on every HTML body before pushing. This bug is not in Webflow's API docs.

3. The Data API GET lies about RichText

A Webflow RichText field has two internal representations: the HTML source (what you PATCHed) and the node tree (what the editor renders). The API GET returns the HTML source. If Webflow's parser failed to convert your HTML into nodes, the GET will still echo the HTML back as if everything is fine.

Visual verification is required. Either open the CMS editor and look at the field, or fetch the live page and grep for the expected element. Do not trust the API GET as evidence of correct rendering.

4. Slugs reserve permanently on soft-delete

When you "delete" a CMS item in the Designer, Webflow moves it to a trash state. Its slug remains reserved. If you import or create a new item with the same slug, Webflow assigns a suffixed slug (product-62276 instead of product).

Worse: the API rejects updates that would conflict with a reserved slug, with unhelpful error messages.

Fix if hit: create a new collection with final names and slugs, migrate items via API, update every internal link. We did this once. Never again.

Avoid: lock the collection structure before any bulk content import. Name collections and slugs with their final production values from day one.

5. MultiReference fields can't change their target collection

If a related MultiReference field on Collection A points to Collection B, and you later create a new Collection B' that should replace B, the MultiReference cannot be retargeted via API or Designer. You must delete the MultiReference field, create a new one pointing to B', and re-populate every item's references.

Avoid: same as #4. Design the collection graph before content creation.

6. Background agents can't receive permission prompts

If you launch a Claude Code background agent that uses the Webflow MCP, every tool call requiring permission will block indefinitely. The agent halts. Foreground agents handle this correctly because the user sees the permission dialog.

Fix: bulk Webflow operations always run as Python scripts from a foreground agent or a terminal, never from a background Claude Code agent.

7. RichText HTML encoding for CMS bindings

When writing a template with CMS field bindings (e.g., inside a <script type="application/ld+json"> block on a collection page template), Webflow expects its own binding syntax:

{{wf {&quot;path&quot;:&quot;slug&quot;,&quot;type&quot;:&quot;PlainText&quot;\} }}

You can type this literally in Designer's custom code editor, or use the "Add Field" picker to insert it. Getting this syntax right matters — a malformed binding renders as literal {{wf ...}} in the output and breaks any JSON-LD or structured data that wraps it.

Pattern: markdown → HTML → compact → push

The full pipeline for a single article:

import re, sqlite3, markdown

def to_compact_html(body_md):
   # Strip YAML frontmatter if present
   md_clean = re.sub(r'^---[\s\S]*?---', '', body_md).strip()
   # Render to HTML
   html = markdown.markdown(md_clean, extensions=['tables', 'fenced_code'])
   # Compact for Webflow RichText
   return compact(html)

conn = sqlite3.connect(DB_PATH, timeout=30)
c = conn.cursor()
c.execute("SELECT id, body_md FROM my_items")
for fid, md in c.fetchall():
   html = to_compact_html(md)
   c.execute("UPDATE my_items SET body_html = ? WHERE id = ?", (html, fid))
conn.commit()

Run this whenever body_md changes. Then the push script from earlier reads body_html and ships it. Separate the "generate HTML" step from the "push" step so you can regenerate without pushing, inspect the HTML in the DB, and push only touched items.

Pattern: the fix-pass architecture

When you discover a systemic content issue post-publish (a formulaic phrase, a meta-content leak, a rendering bug), the pattern is:

  1. Identify scope with a SQL query. SELECT slug FROM my_items WHERE body_md LIKE '%problem-pattern%'. Count first. Never run a transform on the full corpus without knowing how many items will change.
  2. Write the transform as a pure function. def fix(md): return ... takes markdown, returns markdown. No side effects. Unit-test it on 3-5 representative items before touching the DB.
  3. Apply the transform in a transaction. Loop over affected items, call fix(), UPDATE my_items SET body_md = ? WHERE id = ?. Track touched slugs in /tmp/fix_touched.txt.
  4. Regenerate HTML for only the touched slugs. No need to regen the full corpus.
  5. Push to Webflow using the push script, reading the touched-slugs file.
  6. Spot-check on the live CMS editor and rendered page for 3-5 slugs.

This is the Phase 6 pattern from our schema glossary project. We ran six such passes covering rendering bugs, meta-content leaks, em dashes, heading doubling, formulaic openers, and placeholder brand names. Total: 796 writes, zero failures, resume-safe for every pass.

Where Claude Code fits

Claude Code excels at the parts that require judgment. It does not replace the Python script. Use Claude Code for:

  • Designing the pipeline. "I want to push 500 markdown files to Webflow with fact-checking and link validation. Plan the phases." Claude Code returns a multi-phase agent spec.
  • Writing the fix prompts. "I've noticed every article ends with the phrase 'in conclusion'. Write a regex to remove it while preserving code blocks." Claude Code drafts the transform with edge cases.
  • Reviewing scope before push. "Before I push 281 items, query the DB to tell me how many have valid HTML, how many are shorter than 500 chars, and which ones contain suspicious patterns." Claude Code runs the analysis.
  • Investigating anomalies. "20 items came back with 400 errors. Inspect the responses and cluster by cause." Claude Code summarizes.

Do not use Claude Code to execute the bulk push loop. An agent calling the Webflow MCP 281 times is:

  • Slow (each tool call has overhead)
  • Expensive (tokens per call)
  • Unreliable (context drift, permission prompts in background agents)
  • Unrecoverable (no progress tracking)

Claude Code designs the script. Python runs the script.

A worked example

The schema glossary project had 58 type overview articles and 223 field reference articles, living in a SQLite database (glossary.db). Every article had body_md, body_html, meta_title, meta_description, and a pre-assigned webflow_id from an earlier CMS item creation.

Two collections in Webflow:
- Types:

- Terms:

The initial upload took 2.5 minutes for 281 items at 0.5s per push. Subsequent fix passes each took 20 seconds to 2 minutes depending on scope. Total time spent on Webflow pushes across the entire project: ~15 minutes of wall clock, spread across 12 push operations.

If you want the full push scripts and the fix-pass pattern packaged as a reusable Claude skill (with all seven gotchas documented and a runnable Python push template), see github.com/Karpi-Studio-WF/webflow-cms-ops.

Bottom line

  • Use the Data API directly via Python for anything beyond 20 items.
  • certifi.where() for SSL context on mac. Always.
  • Compact HTML before pushing. No whitespace between list tags.
  • Track progress in a file. Resume on restart.
  • Script executes, Claude Code designs.
  • Visual verification is non-negotiable. The API will lie to you.

After 796 production writes across our schema glossary, the only failure mode we ever hit was the whitespace bug — and that was a content parsing issue, not a push-script issue. The pattern above is battle-tested.

Karpi Studio builds production content pipelines for Webflow teams as part of our AEO retainer. If your CMS project is hitting the limits of manual editing or CSV imports, get in touch.

Not sure what your Webflow project would cost?
We scope B2B projects based on what the site needs to do, not a pricing template. Tell us about yours.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.