top of page

Unit Economics of a Single Question

  • Feb 26
  • 4 min read

Most people see ‘$X per 1M tokens’ and assume it will never matter. It matters the moment users show up, because every prompt and every answer is metered.


A token is not a word. It is a chunk of text the model processes, and it can be as short as a character or as long as a full word, with spaces and punctuation counted too. One simple rule of thumb is that 1 token is about 4 characters or about 0.75 words in English, which means 100 tokens is roughly 75 words (OpenAI, 2026b). Token pricing usually bills you twice, once for what you send in and once for what you get back. 


OpenAI's pricing tables list separate input and output rates per 1M tokens, and some models also have a discounted "cached input" rate when the system can reuse previous context (OpenAI, 2025). If you are using a model where input is cheap and output is expensive, then a product that encourages long answers will feel free to the user while becoming costly to run.


A million tokens sounds like a lot until you translate it into ordinary work. If 1 token is about 0.75 words, then 1M tokens is roughly 750,000 words of English text, spread across everything you send and everything you generate. That is a lot of small interactions, the kind most teams do not bother tracking because each one feels trivial.


Start with the simplest case, a writing assistant. Say a user pastes a short brief, maybe 200 tokens, and asks for a response that is 800 tokens. That is 1000 tokens total, and if the user does that 100 times, you have already touched one lakh tokens. Imagine a thousand users doing that for a week. The costs scale faster than people expect because usage is rarely capped and users rarely self-regulate once something feels instant and useful.


Now look at a PDF summariser, which is where the confusion usually begins. A typical paragraph is about 100 tokens, and a typical page can easily be several hundred tokens depending on formatting. If your product sends the whole document every time the user asks a follow-up question, you are paying for the same tokens repeatedly, and you are doing it in the most expensive way possible. The better approach is to send the document once, cache the context, and only send new instructions on follow-up turns. That changes the cost structure immediately.


Then there is customer support, where the unit economics can get sneaky. A chat is not one prompt. It is a series of turns, and each new turn often includes some or all of the previous conversation so the model can stay aware. Tokens that feel like context to the user are still tokens on your bill, unless they are truly cached at a discounted rate (OpenAI, 2026b). This is why two products that look identical on the surface can have very different costs. One sends the full history every time. The other summarises, stores, and reuses context intentionally.


If you want a single example with real pricing, pick one model and do a back-of-the-envelope estimate. OpenAI's pricing page lists GPT-5 mini at $0.250 per 1M input tokens and $2.000 per 1M output tokens, with cached input at $0.025 per 1M tokens (OpenAI, 2025). That means output tokens are eight times more expensive than input tokens on that tier, so the cheapest optimisation is often not prompt trimming, but stopping the model from talking too much. If you cap output at 300 tokens instead of letting it run to 800, you save more than you would by cutting the input prompt in half.


This is also where the product design part shows up. If your UI encourages users to say "write me the perfect answer," they will take it, and they will rarely complain about verbosity. If your UI asks one extra question, like "do you want a short answer or a detailed one," you can cut output by half without cutting usefulness, and that changes your cost curve immediately. The user experience does not degrade, but the token bill just stops growing as fast.


There is one more detail that beginners miss, and it’s important. OpenAI tracks token usage in categories that can include input tokens, output tokens, cached tokens, and for some advanced models, reasoning tokens that represent internal thinking steps (OpenAI, 2026b). You do not need to understand the architecture to understand the implication. Your bill is driven by the amount of text the model processes and produces, and smarter can mean more tokens consumed unless your workflow reduces retries and back-and-forth.


So if you want token economics to stay simple, here are three rules you can use immediately. First, treat tokens like minutes on a phone plan. You do not need perfect counting, but you do need an intuition for what burns the budget fastest. Writing help is cheap per session but scales with volume. PDF work is expensive if you do not cache. Support chats accumulate unless you cap history or summarise turns.


Second, stop paying for the same context repeatedly. Cache what is stable, summarise what is long, and only send what the model needs for the next step. Most platforms offer reduced rates for cached tokens, and using that properly can cut costs by 80 to 90% on workflows that reuse large documents or conversation histories.


Third, make your product choose the model, not the user. OpenAI's pricing tables show that different models have meaningfully different per-token costs, so routing routine work to cheaper tiers is often the easiest win that does not degrade the user experience. Most users cannot tell the difference between a $2 output and a $0.40 output when the task is straightforward, but the bill reflects it clearly.


When you get this right, the outcome is boring in the best way. The AI feels fast, the answers feel crisp, and the bill grows predictably because the product has been designed around the meter instead of pretending it does not exist. Token pricing is not just about billing, but product design as well. Once you start paying per token, you realise you are not just pricing usage, you are pricing the way people are allowed to use the product.

Comments


bottom of page