Context Engineering: The Prompt Was Never the Point, Christi Akinwumi

Every year or two, an AI discipline outgrows the name it started with. Prompt engineering is the one doing that now. What began as a craft of phrasing, the art of asking a model for what you wanted, has quietly turned into something larger and more architectural. The word has not caught up with the work. When a modern conversational system does something useful, the prompt is rarely the part that deserves the credit. The credit belongs to everything the system put in front of the model before the prompt ever ran.

The shift is not a rebrand. It is a reckoning with how these systems actually behave once they leave the notebook. Over the last year, the discourse has landed on a serviceable name for the broader practice: context engineering. The name is useful because it points at what the work actually is. Not composing a cleverer sentence. Assembling, curating, and budgeting the information the model is asked to reason over. The prompt is one ingredient in that assembly. It is not, and was never, the whole meal.

The word that outgrew itself

Prompt engineering was a precise term when it was coined. In the GPT-3 era, the difference between a response that was useful and one that was unusable often came down to how the instruction was phrased. Chain of thought helped. Few-shot examples helped. The right role framing at the top of the message helped. Teams hired people who could sit with a model and coax behaviour out of it by adjusting wording. That was a real skill, and the results were real.

The trouble is that the word started to carry weight it could not support. Once models moved into production, the interesting failures stopped being about phrasing and started being about everything the prompt did not contain. A perfectly written instruction cannot recover from a missing policy. A well-chosen example cannot substitute for a database the model cannot reach. The phrasing craft was still present, but it was increasingly a small part of a larger engineering problem that no one had agreed to name.

What a prompt cannot hold

It is worth being concrete about what falls outside a prompt, because the gap is where the work lives. Users arriving at a conversational system expect it to know things. Some of those things the model can know in advance. Most of them it cannot.

The session has a history. The user has said things the current turn depends on. The organization has policies about what the system is allowed to recommend. There is a knowledge base with the canonical version of every fact the user might reasonably ask about. There are tools the system can call, and the results of those tools are facts the model did not have a moment ago. There is a downstream human or agent who will pick up the conversation and who needs a handoff that preserves what has already been established. None of this is inside the prompt. All of it has to be assembled, by someone, on every turn.

That someone is the context engineer, whether the role has that title or not. The work is to decide, for every turn, what the model is given, what it is expected to remember, what it is allowed to retrieve, and what it is told it does not know. The decisions are not glamorous. They are the system.

Three kinds of context that actually ship work

In practice, the context that flows into a production model tends to come from three distinct places. Teams that do this well treat each of them as a first-class design surface. Teams that struggle usually have one they have not noticed they are ignoring.

Retrieved context

This is the RAG surface. Documents, policies, product descriptions, historical tickets, whatever corpus the model is supposed to reason over. The design work here is less about vector databases than about what the retrieval returns. As I argued at more length in Content Engineering in 2026, most retrieval failures are really source failures: chunks that were never structured for retrieval, titles that were written for humans browsing a help center, canonical facts living in three places with two of them out of date. The retrieval pipeline is only as good as the content it points at, and the content was rarely engineered with a machine reader in mind.

Remembered context

The session has state. Long-running users have history. Somewhere in the system there is a policy, often implicit, about what carries forward across turns, what carries forward across sessions, and what gets forgotten on purpose. In the rush to ship, that policy is usually a side effect of whatever the framework made easy, rather than a decision. The teams that get memory right write it down. They decide what summarization strategy compresses a long session, what fields are stored as durable user state, what is allowed to be used for personalization and what is not. The memory policy is a document, or it is a liability.

Orchestrated context

This is the category most teams underestimate. The system prompt that establishes role and tone. The tool schemas that describe what the model can call and what those calls return. The handoff payload that travels with a conversation when it is routed to a human, an agent, or another model. Each of these is context, and each of them has to be designed deliberately. The emerging Model Context Protocol standard is one sign that the industry is starting to take this layer seriously. But the standard only describes the transport. The design of what travels over it, and when, and to whom, is still a craft decision on every team.

Why 2026 models made this worse before it got better

The paradox of capable models is that they make prompt craft less important and context craft much more important. When instruction following was unreliable, a better sentence could move a metric. The marginal return on phrasing was high. In 2026, instruction following is, for most practical tasks, effectively saturated. The model will do what you ask. The remaining question is whether what you asked had the information it needed to be answerable.

This is the version of the problem I was circling in The AI Isn't Broken. When the model is doing its job and the system is still failing, the failure is upstream. It is in what the model was handed. The more capable the model gets, the more every remaining error becomes a context error in disguise. Tuning the prompt is, at that point, a displacement activity. The leverage has moved.

Anthropic's work on AI Fluency,^[1] which has been one of the more useful research framings to emerge this year, points in the same direction from the user side. The people who consistently get work out of these systems are the ones who give context skillfully. They do not write better prompts. They build a better situation for the model to work in. The enterprise version of that skill is context engineering, done at the level of the product.

Context engineering as a design discipline

Treating context as something to engineer changes what artifacts a team produces. A context budget becomes a real document: how many tokens are spent on the system prompt, how many on retrieved passages, how many on session history, how many held in reserve. A retrieval specification becomes explicit about what the index contains, how chunks are typed, and what the top-k policy is under different intents. A memory specification says what the system is allowed to remember about a user, for how long, and with what justification. A handoff contract says exactly what payload travels when a conversation moves from one actor to another.

None of these are new ideas. What is new is the expectation that a serious team will have written them down, versioned them, and tied them to the evaluation that says whether they are working. Anthropic's guidance on building effective agents^[2] has been useful here precisely because it refuses to locate the design inside the model. The design lives in the choreography: which component runs when, what it sees, what it hands off, what it is not allowed to touch. That is context engineering at the orchestration scale.

The discipline also forces an honest conversation about cost. Every token in the context window is a choice, and every token has a price, a latency cost, and an opportunity cost against something else that could have been there instead. A team that has not budgeted its context is, in effect, letting whichever developer wrote the most recent feature decide what the model sees. That is a bad way to run a system the business depends on.

What to try next week

The practical starting point is smaller than the framing suggests. Open the system prompt for one production flow and read it carefully. Ask, of every sentence, whether it belongs there or whether it should be retrieval, a tool result, or a piece of session state that the prompt references but does not contain. Most system prompts, on inspection, are carrying facts that should have been pulled in on demand, and they are doing so in a way that quietly costs the team accuracy, flexibility, and tokens.

Then look at one full turn in production and count where the tokens went. How much of the context window was the system prompt, how much was retrieved passages, how much was prior turns, how much was the user's actual message. The answer is almost always surprising. Teams who have never done this exercise tend to find that the system prompt has swelled to the point where the model is reasoning over more scaffolding than substance. The fix is rarely a rewrite. It is a reassignment. Move the policy into retrievable text. Move the memory into a structured store. Let the prompt shrink back to the thing it was always meant to be: a short, opinionated instruction about how to use what the system has already assembled.

Prompt engineering is not going away. It is becoming a smaller, more honest part of a larger job. The larger job has a name now, and the name is doing useful work. Context engineering tells the discipline what it is responsible for: not the sentence, but the situation. Get the situation right, and the sentence almost writes itself.

References

[1] Anthropic. AI Fluency: Framework & Foundations . Anthropic , 2025 ↩
[2] Anthropic. Building effective agents . Anthropic Engineering , 2024 ↩

Context Engineering.
The prompt was never the point.