Feeding a document to an AI? Markdown almost always costs fewer tokens and parses cleaner than the same content as a PDF. Where each format wins — and why the thing you hand a person at the end is neither.
There is no single number — it depends on the model, the image size, and whether you send a PDF as text or pixels. The per-model formulas, real measured costs, and how to count it for your own model.
A Claude artifact looks self-contained inside the chat and often isn't. The five things that quietly stay external — CDN scripts, Tailwind, fonts, images, fetch calls — and how to inline each.
miinideck turns a single HTML file into an unguessable link with optional password and expiry. Default-private, never indexed.
Three formats people reach for by habit, built for three different jobs. A job-by-job map of when Markdown, HTML, or PDF is the right shape — and why they work best as a pipeline, not a choice.
You drop a 40-page market report into the chat and ask for a summary. The model takes a beat, then hands back something half-right and oddly fixated on the footer that repeats "CONFIDENTIAL — Q3 2026" forty times. The table on page 12 — the one with the actual numbers — came back as a single run-on line. And the context window is already two-thirds full, on one document.
The information was all there in the PDF. The container just spent your tokens carrying the wrong things.
This is the part of "use AI on your documents" that nobody warns you about: the format you feed the model decides how much it costs and how well it reads, before the model does any thinking at all. For that job, Markdown and PDF are not close.
A model can't read a PDF the way you do. The file has to become something the model ingests — and there are only two paths, each with its own tax.
A tool pulls the text out of the PDF and hands the model a stream of characters. When the extraction is clean, the token count is roughly the same as Markdown of the same words. The problem is that real-world PDFs almost never extract clean:
None of that noise was information. All of it costs tokens, and worse, it dilutes the model's attention across junk.
The other path skips text entirely: each page is rendered as an image and handed to a vision-capable model. This preserves layout — the model sees the table as a table — but the price is steep and fixed:
A full page rendered as an image costs on the order of several hundred to well over a thousand tokens, depending on the model and resolution — and that cost is the same whether the page is dense or nearly blank.
So a 40-page report can spend tens of thousands of tokens just being seen, before a single question is answered. For a page that's mostly text, you paid image prices to deliver text content. For a page that's mostly a diagram, the image path is the only one that works — which is exactly when PDF earns its cost (more on that below).
Markdown wins the token math, but the bigger win is comprehension.
Markdown is plain text with a few characters of structure. There's almost no overhead between the information and the tokens — it sits close to the floor for anything text-shaped. And the structure it does add is the structure the model wants:
## Q3 Revenue by Segment
| Segment | Revenue | YoY |
|-------------|---------|------|
| Enterprise | $4.2M | +31% |
| Mid-market | $1.8M | +12% |
Net revenue retention held at **118%**, driven by the
enterprise expansion motion described in the prior section.
That table survives as a table. The heading is unambiguously a heading. The model was trained on millions of documents shaped exactly like this — READMEs, wikis, docs, forum posts — so ## reads as hierarchy, | reads as a grid, and **118%** reads as emphasis. Hand the same content to the model as extracted PDF text and the structure is gone; hand it as a page image and you paid image prices for it.
The practical effect: for retrieval, summarization, and Q&A over a document — the bread-and-butter of "feed AI my files" — Markdown gives the model fewer tokens to chew and cleaner structure to navigate. It's the efficient default, and it's not a close call.
The AI gives you back a clean HTML page or report — now someone else needs to open it. Drop the file, get a private link in under 60 seconds. No card, no account, 7-day self-destruct — useful for seeing how the handoff actually feels before you send the real one.
Markdown wins the input job. That doesn't make PDF wrong — it makes it built for a different job, and that job is real:
The honest split is by direction of travel. Document going into a model → Markdown, almost always. Document going to a human who needs fixed layout or a signature → PDF. Most of the "PDF is better" cases are really "a person needs this frozen," not "a model reads this better."
When the input job is the one you're doing, the goal is clean Markdown with as little of the PDF tax as possible:
# levels, lists as real list items, code in fenced blocks — that's what lets the model navigate instead of reading one long paragraph.Here's the turn most format comparisons miss. Markdown wins going into the model. But the thing the model gives back — a report, a one-pager, a dashboard — eventually has to reach a human. And for that, neither Markdown nor PDF is the natural answer.
Raw Markdown looks like code to a non-technical reader: ## and | on screen, not a finished page. A PDF freezes any interactivity into a flat screenshot and forwards as a heavy file that forks into everyone's Downloads folder. The format that renders as a real page, keeps interactivity alive, and forwards as a link is HTML — which is the case for HTML over PDF on the human-delivery side, the mirror image of the token argument here.
So the full path looks like this: feed the model Markdown, and when it hands you HTML back, share that as a link rather than re-flattening it into a PDF. The last mile is a URL someone opens — sharing an HTML report with one person covers the channel choice, and if the output came out of a Claude or ChatGPT artifact, the export-and-share path is its own short walkthrough.
The container you choose at each step is quietly deciding the cost and the quality — Markdown to spend fewer tokens going in, a private HTML link to read as finished coming out.