Give us an inch and we’ll take a mile (of AI)

Author: alex.steele@leadingai.co.uk

Published: 05/06/2026

AI cost management

I once watched an adult man hit a really plentiful, high quality, all-you-can-eat buffet then return to his table with exactly three things: strawberries, steamed broccoli, and a look of deep disappointment. Faced with an overwhelming choice of properly delicious food, he had somehow panicked and selected neither the things he wanted nor the things that made sense together.

At the other end of the buffet behaviour spectrum are the people piling spaghetti bolognese on top of steak and chips because abundance does something strange to our brains. For them, regret will come later.

Buffet behaviour and lessons in tokenmaxxing

When something feels unlimited, we tend to stop optimising for value and start optimising for quantity. And we panic that it might get taken away again because it seems too good to be true.

Generative AI can feel like this (assuming you work somewhere that has not reduced your choices to a single approved hexagonal button). Give people access to multiple models, unlimited prompts, agents, copilots, enterprise licences, and effectively unlimited tokens, and some freeze – while others build stacked plates of AI activity that look impressive but are difficult to justify afterwards and might leave you struggling to deal with the consequences.

Buffets*, it turns out, are surprisingly good preparation for AI strategy.

There is a word for part of this phenomenon: tokenmaxxing: using as many AI tokens, prompts, models, or licences as possible just because they are available – often without much attention to whether the extra consumption creates extra value.

Before we go further, a quick explainer to check we’re all on the same page: tokens are essentially the chunks of text AI models process. Roughly speaking, your prompt uses tokens, the AI’s response uses tokens, uploaded documents use tokens, and increasingly the AI’s internal reasoning and tool use consume tokens too. More prompts, longer conversations, bigger files, and more powerful models usually mean more tokens consumed and more cost somewhere in the system.

Most AI users never see this in the details behind their monthly subscriptions. Which is a bit like an all-inclusive holiday wristband: the buffet still costs money, you just don’t see the bill arriving plate by plate.

The best examples of tokenmaxxing are both entertaining and slightly horrifying. Outside tech teams and their tokenflexing (the name it should have had), we also have employees using enterprise AI subscriptions to check the weather, and teams regenerating outputs five times for tiny improvements. It’s like bragging you went to tonnes of meetings when you can’t explain what was achieved in each one.

Overwhelm

This is not surprising: we are quite predictable and have a long-standing commitment to repeating patterns of poor behaviour.

Psychologists call part of this choice overload. When we’re faced with too many options, we either freeze, make odd decisions, or default to familiar behaviours. That is why some organisations have staff using a frontier model for tasks that could have been solved with a search engine or a glance at their phone screen, while others never move beyond “summarise this meeting”.

Economists have another explanation: the rebound effect, sometimes called Jevons paradox. When something becomes cheaper or easier, we often use more of it, not less. More efficient cars led to more driving. Cheap cloud storage led to more files. Easier content creation mainly leads to more content – and we’re seeing this already in the growing number and length of emails in this age of AI-assisted drafting.

The problem comes when AI consumption is mistaken for AI value.

I worry that organisations are drifting into what might politely be called AI theatre: leaders celebrating licence numbers while the harder questions remain unanswered. Are staff faster or more likely to stay? Are outcomes better? Would a cheaper model have done the same thing?

There’s no such thing as a free lunch

Tokens are not free, and many organisations are discovering this already. Usage caps, tiered access, and uncomfortable conversations with finance teams are beginning to happen more often. The era of unlimited experimentation is slowly colliding with procurement rules and budget holders.

Uber reportedly burned through its annual budget for Claude Code months early and senior leaders have publicly questioned whether increased token use is translating into better products or outcomes. That does not mean AI is failing; it means finance directors have entered the chat.

This feels familiar. Most new technologies go through a period where consumption itself becomes the metric – and it’s not a bad leading indicator while you figure out a better KPI. But eventually somebody asks the awkward question: are we getting enough value for what we’re spending?

Environmentally, the picture is hard to ignore. Unlimited buffets are never really unlimited: someone still pays, someone still cooks the food… and someone still throws away what doesn’t get eaten. AI isn’t so different.

If this sounds familiar, it should. We’ve written before about right-sizing AI, AI slop, and the hidden environmental costs of treating compute as infinite. The sustainability conversation has not gone away. We just added finance teams to it.

Every unnecessary prompt consumes compute, energy, cooling, and infrastructure somewhere else. This does not mean we should stop using AI any more than electricity use means we should stop using computers. But there is a difference between productive energy use and waste.

You need a strategy for your tech, just like you do at the buffet: don’t try and take a full portion of everything; you won’t finish. Prioritise the things that made your mouth water.

Don’t deny yourself a visit to the buffet; prioritise value over volume

One reason we often work with organisations on simpler pricing and deployment models is because most people do not need an all-you-can-eat buffet; they need the sushi carousel.

A good sushi carousel still gives you choice and volume. The right thing appears when you need it. You take a sensibly-sized plate, use it, and move on. You are less likely to panic, take more than you need, or leave wondering why you had prawns and cheesecake on the same plate.

This is also why we tend to design AI systems this way: smaller models for smaller tasks, bigger models when complexity genuinely requires them, sensible guardrails, and cost models that encourage value rather than volume.

The goal is not less AI; it’s less regret.

 

*My top three buffets, as of June 2026, and not that you asked, are:

  1. The lovely Glasshouse at The Grove in Watford. Sorry, Hertfordshire. Delicious, and it comes with a side order of A-list celeb diners from the nearby film studios.
  2. Les Grands Buffet in Narbonne. So I haven’t actually been yet, but it’s going to happen and I’m surviving on their Insta feed until then.
  3. Sunday brunch buffet at the Pan Pacific in Vancouver. A decades-old institution with All The Foods, and you can watch sea planes take off and land as you eat.

I appreciate these all require some advanced planning but, until then, you’ve always got Mr Wu on Shaftesbury Avenue in London. Best enjoyed with a couple of Tsing Taos.