Leading AI - PromptWatch: what we’ve learned from AI’s prompt adolescence phase

When we published Prompt like nobody’s watching, the professional AI prompt engineering world was a bit more earnest, made up of cleanly composed prompts, quietly competent early adopters, and a strong preference for control.

In early 2024 – which is a very long time ago in gen AI world – prompting felt like a specialist art. You needed the structure, framing, and some precision. Everyone was generally encouraged to write prompts like they’re briefing a junior civil servant in 1999:

"Please summarise the attached policy document in no more than 300 words, using clear and accessible language suitable for a non-specialist audience. Include references to any relevant legislative context and highlight implications for local delivery teams. The tone should be neutral, professional, and aligned with departmental style guidance."

It’s beautifully structured and still works (except for the word count bit, which AI still struggles with), but it also has boomer vibes – like adding two spaces after a full stop. For most humans - and now for most gen AI tech - it’s over-engineered. It reflects how we thought we had to get it right first time and leave the machine little or no wiggle room. Very little room, in fact, for the AI to be an AI and live up to its full potential.

If humans were going to consistently commission work that way, we would also be so cleanly organised that I’m not sure we’d even need search engines; everything would be beautifully filed. Except… all our staff would have left us because we task-managed them as if they had no minds of their own. And that’s why rigidity rarely sticks; we’re empowered by having some space to think and make choices, and so is AI.

How times change

Happily, things have moved on. The models are more capable now. They infer intent, handle context better, and allow for mid-task course correction. And as users spend more time with their chosen tools and become more confident in the output, humans and machines start to change each other.

Prompting has evolved from a niche technical skill to a core digital capability for everyone. Now we encourage people to think like a director (one with good leadership skills): say what you want, why you want it, and how you’d like it shaped. Ask the model to help you improve the question. Save the bits that work. Share them. Try again. You don’t need to master everything — you just need to get comfortable asking for what you need and knowing when the answer’s wrong.

Prompting as a conversation, not a command

The real change isn’t just in the tools, it’s in mindset. We’ve stopped thinking of AI as a vending machine and started treating it more like a smart, creative assistant. People are more comfortable sharing context, goals and constraints, and letting the assistant consider the best way to respond to them. We’ve become aware we each have capabilities the other doesn’t.

My favourite development has been the effectiveness of asking the model itself to help improve your prompt (something we build into our own tools). It’s not just energy efficient; it’s a learning loop. For new users, it takes away the pressure to get it "right" up front. This became normalised much earlier in developer circles: coders have long been asking tools like GitHub Copilot or ChatGPT to improve their own requests, even if they mostly shared the results on Reddit, so the rest of us had no idea.

Not all tools speak the same language

Prompting skills carry over between tools, but their personalities and quirks matter. Here’s a run down of how things look now – but I write this knowing we’ll need to rewrite it in a few months time... Still, as of today:

Claude 3 (Anthropic) is excellent with longer context, gentle reasoning, and reflective problem-solving. It tends to be more cautious and verbose — ideal for summarising, red-teaming, or considered, structured tasks. It gives European vibes in the way it writes.

GPT-4 (OpenAI) is probably still the strongest generalist – a fact it shamelessly volunteers if you ask: fast, coherent, and capable of multi-step reasoning, especially with structured tasks or defined roles. It holds the most ground for creative/functional crossover work.

Gemini 1.5 (Google) continues to improve. It’s much better at handling longer inputs and is increasingly capable in coding and document generation — but tone control and factual reliability can still… vary. That’s why it appears more often in “mad sh*t the AI said today” memes.

Copilot (Microsoft), while powered by GPT under the hood, behaves differently in Office tools. It leans heavily on document structure and user history, and its prompting style is often more about nudging than direct command. We think this is still pretty accurate. Sorry, Microsoft.

The basics apply everywhere — be clear, give context, iterate — but your results will vary depending on tone, pacing, and task. As with any relationships, how you ask matters. You probably talk to your boss, your work bestie, and your barista differently - unless you work for a barista, and they’re your bestie. Prompting different gen AI models is a lot like that: you show up differently depending on their role in your life, their style and how you want to come across.

What people are really doing with prompts

There’s always a gap between what people say they do with AI and what they actually do. According to the Washington Post’s analysis, drawn from millions of prompts sent to tools like Claude, ChatGPT, and Gemini, the top requests aren’t surprising: writing support, document summaries, help with studying. But some of the more creative regulars include things like “what’s wrong with me?”, “explain this like I’m five”, and “write a letter to get me out of jury duty.*”

I suspect the content filters also removed a host of more… unusual asks.

Meanwhile, Harvard Business Review offered a broader view recently: users are reaching for GenAI across both personal and professional life — from drafting birthday toasts and CVs to real-time productivity hacks and small-scale strategy tasks. The top 100 use cases include plenty of admin (email, formatting, condensing), but also emotional support, decision breakdowns, and requests for help “figuring out what to do next.”

Then there’s coding.

Fewer people say they use GenAI for programming when surveyed — it barely registers in the top five use cases by self-report. But it consistently appears near the top in prompt volume. Why? Because the developers who are using it are really using it. They’re not sending one-off requests. They’re 30 prompts deep into a debugging sesh, feeding the model error messages, tests, and edge cases. Over and over.

And that tells us something really important:

It’s not just who’s prompting, it’s how embedded the prompting is. A small group of power users — people with a clear task, a decent grip on what they want, and no fear of iteration — can completely change how a team gets work done, and how much they get done. Even if it’s only 10% of your staff.

To pick up on last week’s theme: If you’re only tracking logins or surface stats, you might miss where the real transformation is already happening.

What we got right, and what we got wrong

Back when we published our prompt advice last year, I thought the biggest hurdle would be building confidence to even start prompting. That’s still true for some, but less than I expected. We’re already seeing new professionals enter the workforce having had access to tools like ChatGPT throughout university. That barrier will age out fast.

What surprised us more was how quickly people moved from cautious to bold once they’d broken the seal — and how often they ignored carefully crafted prompt templates in favour of their own improvisation.

We got a lot of things right: low-pressure environments work. Informal play beats formal training. Real-life prompts, even messy ones, build trust far more effectively than polished examples. Prompting is a literacy — not a cheat code. Once it clicks, people run with it. Even the sceptics start to explore — sometimes late at night, quietly, when no one’s watching.

But we underestimated how often the best prompt advice would come from the AI itself — or how rapidly prompting would become a gateway to rethinking entire workflows. In hindsight, I was still too focused on “writing better prompts” and not focused enough on helping people ask better questions — the ones that challenge assumptions and reshape practice.

Now, we talk less about format and more about curiosity. Less about perfecting the input, more about exploring the possibilities.

So what now?

If you’re rolling out GenAI tools and want your team to build skill and confidence, do what’s worked best for us and all our awesome partners: make space. Let people try things. Share what works. Talk about what didn’t. Skip the rulebooks (think twice about compulsory, day-long prompt training from someone who was running last year’s risk workshop). Give everyone something safe to play with and let curiosity do the heavy lifting. Offer drop ins, blend demos into workshops about your strategy or whatever else you had scheduled in already.

Progress doesn’t come from perfect planning or polished workflows. It comes from openness, exploration, and knowing when to loosen control.

*You should do jury duty; you don’t want to live in a world where juries are made up solely of people with time on their hands and an obsession with true crime.

‍

PromptWatch: what we’ve learned from AI’s prompt adolescence phase

I shouldn’t have to say this, but: don’t use your AI for therapy

Your last chance to teach AI what fairness looks like

It costs nothing to talk. Find out how Leading AI's engines can help you. Click the button below and we'll be in touch very soon.

Leading AI Limited.
Registered in England and Wales, Company Number 15172734.
Registered Office 124-128 City Road, London, England, EC1V 2NX.

PromptWatch: what we’ve learned from AI’s prompt adolescence phase

Related posts

I shouldn’t have to say this, but: don’t use your AI for therapy

Your last chance to teach AI what fairness looks like