BLOGS
Jun 19, 2025

Attention is all you need: The anniversary of when AI started to do good listening

AI became a much better listener in 2017, when a breakthrough paper introduced the transformer model—the foundation for tools like GPT. For the first time, models could focus on meaning and context, not just word order. That shift made today’s language tools possible, but it also means we need to stay sharp. Because while AI is getting better at understanding us, it’s up to us to keep thinking clearly.

Attention is all you need: The anniversary of when AI started to do good listening

Last week, we talked about how people ‘coped’ when ChatGPT went down for a few hours and the internet quietly lost its mind. Professionals who would have mocked the idea of depending on AI just a few years ago were suddenly staring at blank screens, realising just how much they'd come to rely on it — for writing code, solving awkward phrasing, generating first drafts of pretty much everything, or just helping to think things through.*

It doesn’t take a qualified therapist to tell you we hit some emotional notes around our deeply human issues with dependency and attachment. But mainly we were sharing our semi-regular reminder that we are entering the age of having to consciously work on our critical thinking skills day to day, or risk losing them. That’s because the routine tasks that used to challenge you throughout your day might not keep your brain ticking over anymore — especially when a lot of problems seem to solve themselves.

Keeping on the therapy theme (kind of – we are not licensed mental health professionals), it’s fitting that this week also marked the eighth birthday of a research paper with the excellent title Attention Is All You Need. If it were an episode of Friends, this paper would be The One That Made All This Possible. It’s not therapy advice (although it sounds like it could be). It’s a technical paper published by eight researchers at Google in June 2017, and it changed the trajectory of AI more than anything else in the last decade.

Why this matters

Simply put: before that paper, AI models struggled to understand context. They processed words one at a time.

What changed with this paper was the model’s ability to understand how words relate to one another — not just in sequence, but by meaning and context. I sometimes describe it (to people who ask; I don’t want to be patronising) as a giant bucket of word soup: every word, phrase, and sentence the model has seen, all swirling around.

What makes these models powerful is that they don’t just store the words — they map them. Not a database in the usual sense, but it works a bit like one: words that appear together a lot, or behave similarly in language, get stored closer together in the model’s internal landscape. That’s how it can relate “doctor” and “nurse” or “city” and “capital” — not by logic, but by learned proximity.

The transformer architecture described in the paper made it possible to make sense of that soup, by paying attention to which words matter in a given moment. And that capability is what made GPTs possible — the large models that now sit between us and that vast tangle of language patterns, helping us find structure, generate ideas, or just write things more clearly.

Letting Artificial Intelligence be Intelligent

I cannot be the only parent who has said to their child: “Did you do good listening?” It’s the phrase that tends to come out of my mouth when I can see they were told something, but didn’t really pay attention. Here’s a real example:

“What did your teacher say to you, right before you hit your head?”

“Er… something about water on the floor and running?”

(Deep sigh. Long-ish pause.) “…and would you say you did good listening?”

As humans, we get better with age at focusing and inferring. We might not listen to every word, but we get better at getting the gist — because we relate what we’re told to what’s happened before, or what we expect might be the case. We sometimes get it wrong, but more often than not, we’re right.

This was AI’s equivalent maturity stage: the point at which models became capable of catching our drift.

The idea behind the transformer architecture was simple and powerful: let the model look at the whole sentence (or paragraph, or document), and give it a way to focus on the bits that matter. That’s the “attention” part.

The result has been language models that are faster, more accurate, and actually able to hold a train of thought. It was the foundation for GPT (that’s the T – transformer), Claude, Gemini, and pretty much every other model in use today.

The original paper is now one of the most cited computer science publications of all time. Its authors have become tech legends, scattered across OpenAI, DeepMind, and a few ambitious startups.

So if you’ve ever asked yourself, “How does this thing actually work?”, the answer, surprisingly, starts with something very basic: paying attention.

________________________________________

Try this

Want to see how far things have come — or just remind yourself how weirdly clever these models are? Here are a few prompts that have become modern classics, and that are – I think – even more fun to do when you understand better what’s happening behind the curtain:

1. The one they call the “God Prompt”

A semi-serious favourite that turns a GPT-based language model into a pop-psychology analyst:

You are an expert in behavioural psychology, neuroscience, and therapy. I want you to help me understand my behaviour patterns. Ask me questions, reflect my answers back to me, and suggest strategies.

It won’t replace your therapist, so please don’t let it. But it might help you spot a few patterns you hadn’t considered.

2. The one that helps with marketing

A go-to for anyone who’s ever said “We just need some quick ideas”:

Help me create viral marketing ideas for [my product]. Ask clarifying questions to define the audience and tone. Then generate ten ideas I can test on social media.

3. The one that is quite meta

What are some prompts that have gone viral?

You’ll get a snapshot of how people are using these tools — creatively, emotionally, and sometimes just to see what happens.

If you’re curious about how it all works beyond analogies about my family and other animals, you can read Attention Is All You Need on arXiv. Or take the gentler route: Wikipedia’s explainer on transformer models is surprisingly decent, and tools like Perplexity can walk you through anything that still feels murky.

Or do what I did and bring back the viral “…like I’m five” prompting phenomenon. As in, “Explain the Attention Is All You Need paper like I’m five.”

________________________________________

*Of course, not everyone noticed ChatGPT was down. I was in a real-life meeting, my mum and the dog were having a nap in the shade, and my significant other was happily using Anthropic’s Claude to work on some code. More to the point, most people, globally, still don’t use it at all — or even have the means to. And quite a lot of the ones who could use it will still call it “ChatGDP” (sic), usually as part of a sentence like, “I haven’t really tried ChatGDP, but it’s the thin end of The Wedge, isn’t it.” When they do, remember there is always fun to be had in getting them to talk you through The Whole Wedge.

Getting back to my point: as of last year, nearly 3 billion people still didn't have internet access at all. And even among the connected, the idea of using a large language model to help with admin or self-reflection is pretty unfamiliar.

In Europe or the US, it might feel like AI is everywhere. But it’s not. Whether that’s because of infrastructure, affordability, language, or just other more urgent priorities.

We’re not saying this to be provocative or performative. Having some perspective does help with empathy, but it’s also about paying attention to evidence and context as another way to keep your critical thinking skills in good shape. Your world is not the world. And it’s worth remembering that because we’re the ones training the AIs that everyone else will use.

Explore our collection of 200+ Premium Webflow Templates