There is no single number — it depends on the model, the image size, and whether you send a PDF as text or pixels. The per-model formulas, real measured costs, and how to count it for your own model.
Feeding a document to an AI? Markdown almost always costs fewer tokens and parses cleaner than the same content as a PDF. Where each format wins — and why the thing you hand a person at the end is neither.
A Claude artifact looks self-contained inside the chat and often isn't. The five things that quietly stay external — CDN scripts, Tailwind, fonts, images, fetch calls — and how to inline each.
miinideck turns a single HTML file into an unguessable link with optional password and expiry. Default-private, never indexed.
Three formats people reach for by habit, built for three different jobs. A job-by-job map of when Markdown, HTML, or PDF is the right shape — and why they work best as a pipeline, not a choice.
You paste one PDF into the chat for a quick summary, and the context window is suddenly two-thirds full. The document was not even that long. What happened is that the format you sent it in decided the token bill before the model read a word — and for images and PDFs, that bill is bigger and less predictable than for plain text.
Here is the honest version of "how many tokens is an image," because the clean single-number answers floating around are all quietly wrong.
Three variables move the count, and any answer that ignores them is rounding:
So "how many tokens is an image" has the same shape of answer as "how long is a piece of string." But the formulas are public, so you can pin down a real range instead of a myth.
For 2026's flagship models, a plain 1024×1024 web image lands in a fairly tight band:
That is roughly a 1.4× spread on a standard image. Push to a large phone photo and it widens to about 2,500–6,600 tokens depending on the model, because the formulas scale differently with size. The takeaway: an image is rarely "a few hundred tokens." It is usually one to several thousand.
A PDF is the format most likely to blow up a token budget, because "send the PDF" can mean three different things — and the numbers are far apart. In one engineer's measured run on a 2MB PDF:
And rendering each page as an image costs the per-image rates above, per page — native PDF support on some models does this and extracts the text, so you pay for both. For comparison, the same content as clean text or Markdown is a tiny fraction of any of these. That gap is the whole reason "feed it Markdown, not PDF" is the token-efficient default.
Counting tokens is the input side; the output the model hands back is usually a page someone needs to open. Drop that HTML at a private link in under 60 seconds — no card, no account, 7-day self-destruct.
Do not trust a blog's number — including this one — for a production batch. The formulas change every model version, and the cheapest way to be sure is to measure:
The numbers move, but the order of magnitude is stable: plain text is cheapest, an image is one to several thousand tokens, and a PDF sent carelessly is the most expensive thing you can put in a context window. So the defaults that save the most tokens are the boring ones — send text or Markdown rather than a raw PDF, extract a PDF before sending it, resize images, and count a sample before you run a batch of a thousand. The model's reasoning budget is finite; every token spent decoding a bloated format is one it cannot spend on the actual problem.