How many tokens does a PDF use?

It depends on how you send it. In one measured example, a 2MB PDF cost about 770,000 tokens dumped as raw base64, about 97,000 tokens as extracted text, and around 8,000 tokens through an extraction tool — versus a few hundred tokens for the same words as clean text. Sending a PDF raw is the most expensive path and the easiest to trigger by accident.

Why does the same image cost different tokens on GPT, Claude, and Gemini?

Each provider tokenizes images with a different formula. OpenAI counts the image in pixel patches, Claude approximates width times height divided by 750, and Gemini charges 258 tokens per 768-pixel tile. The same picture runs through three different math, so the counts differ — by a little on a small image, by a lot on a large one.

How do I reduce the token cost of images and PDFs?

Send text or Markdown rather than a raw PDF; if you only have a PDF, extract the text first. Resize images before sending, since cost scales with pixels, and use a lower-detail mode where the model offers one. The single biggest saving is usually converting a PDF to text instead of sending it as pages.

How do I count tokens for a specific model?

Use the provider's own tool rather than a blog's number, because the formulas change each model version. tiktoken counts text for OpenAI models, Anthropic's count-tokens API counts for Claude, and several web calculators estimate across models. For images, the provider's token-counting endpoint returns the exact figure.

How-to & formats

How many tokens does an image (or a PDF) cost an LLM? (2026)

There is no single number — it depends on the model, the image size, and whether you send a PDF as text or pixels. The per-model formulas, real measured costs, and how to count it for your own model.

By miinideck ai research teamJune 2, 20265 min read

Why there is no single number

Three variables move the count, and any answer that ignores them is rounding:

The model. Each provider tokenizes images with a different formula. The same picture costs different amounts on GPT, Claude, and Gemini.

The size. Image cost scales with pixels, so a thumbnail and a full-resolution photo are nowhere near each other.

The format path. A PDF can reach the model as extracted text, as page images, or as raw base64 — three very different bills for the same file.

So "how many tokens is an image" has the same shape of answer as "how long is a piece of string." But the formulas are public, so you can pin down a real range instead of a myth.

What an image costs, per model

For 2026's flagship models, a plain 1024×1024 web image lands in a fairly tight band:

GPT-5.5 — ~1,024 tokens. OpenAI counts images in 32×32-pixel patches, capped around 2,500 patches (or a 2,048px maximum dimension in high-detail mode).

Gemini 3.1 Pro — ~1,032 tokens. Images with both sides ≤384px are a flat 258 tokens; larger images are cut into 768×768 tiles at 258 tokens each.

Claude Opus 4.7 — ~1,398 tokens. Claude approximates tokens as (width × height) ÷ 750, resizing so the long edge caps at 2,576px.

That is roughly a 1.4× spread on a standard image. Push to a large phone photo and it widens to about 2,500–6,600 tokens depending on the model, because the formulas scale differently with size. The takeaway: an image is rarely "a few hundred tokens." It is usually one to several thousand.

Why a PDF is the expensive way to send text

A PDF is the format most likely to blow up a token budget, because "send the PDF" can mean three different things — and the numbers are far apart. In one engineer's measured run on a 2MB PDF:

Raw base64 — ~770,000 tokens. Dumping the file's bytes. The worst path, and the easiest to trigger by accident.

Extracted text — ~97,000 tokens. Pulling the words out first. Far better, but still carrying extraction noise — repeated headers, broken tables.

Through an extraction tool — ~8,000 tokens. A roughly 12× reduction over the inline-text path.

And rendering each page as an image costs the per-image rates above, per page — native PDF support on some models does this and extracts the text, so you pay for both. For comparison, the same content as clean text or Markdown is a tiny fraction of any of these. That gap is the whole reason "feed it Markdown, not PDF" is the token-efficient default.

Counting tokens is the input side; the output the model hands back is usually a page someone needs to open. Drop that HTML at a private link in under 60 seconds — no card, no account, 7-day self-destruct.

Try it free (no signup)

How to count it for your own model

Do not trust a blog's number — including this one — for a production batch. The formulas change every model version, and the cheapest way to be sure is to measure:

Decide the path — text, image, or raw bytes — because each is counted differently.

For text, run it through the provider's tokenizer (tiktoken for OpenAI, the count-tokens endpoint for Claude) for an exact figure.

For an image, apply the model's formula above, or upload it to the provider's token-counting endpoint, which returns the exact count.

For a PDF, extract the text and count that; if you need the page images, count one page and multiply.

Shrink before scale — a resized image or a Markdown conversion can cut the count by an order of magnitude, which compounds across a big run.

The practical takeaway

The numbers move, but the order of magnitude is stable: plain text is cheapest, an image is one to several thousand tokens, and a PDF sent carelessly is the most expensive thing you can put in a context window. So the defaults that save the most tokens are the boring ones — send text or Markdown rather than a raw PDF, extract a PDF before sending it, resize images, and count a sample before you run a batch of a thousand. The model's reasoning budget is finite; every token spent decoding a bloated format is one it cannot spend on the actual problem.

How many tokens does an image (or a PDF) cost an LLM? (2026)

More in How-to & formats

PDF vs Markdown for AI: which saves tokens, which reads cleaner (2026)

Making a Claude artifact self-contained: the export checklist (2026)

Send your own private link.

Markdown vs HTML vs PDF: which format for what (2026)

Why there is no single number

What an image costs, per model

Why a PDF is the expensive way to send text

How to count it for your own model

The practical takeaway