Prompting basics.
A prompt is not a magic incantation — it is the entire interface between you and the model. This entry teaches the four things that actually move output quality: a clear instruction, the right context, concrete examples, and an explicit shape for the answer. By the end you will be able to look at a mediocre prompt and name what is missing instead of randomly rewording it.
What a prompt actually is.
When you call an LLM, everything the model sees is text: a sequence of tokens it must continue. There is no hidden channel, no "settings the model just knows." The instruction, the data, the examples, the conversation history, the tool definitions — all of it is concatenated into one input and handed to a next-token predictor. Internalizing this dissolves most prompting confusion: the model cannot read your intent, only the tokens that encode it.
That means the question "why did the model do X?" almost always reduces to "what in the prompt made X the most probable continuation?" Good prompting is the discipline of arranging tokens so that the desired answer is the path of least resistance.
A useful test: paste your prompt to a competent colleague with zero context about your project and ask them to do the task. If they would be confused, the model will be too — it has even less context than your colleague.
The four levers that move quality.
1. A specific instruction
Vague instructions produce vague output. "Summarize this" leaves length, audience, and focus undefined, so the model picks defaults that are probably not yours. "Summarize this support ticket in two sentences for an engineer, focusing on the reproduction steps" closes off the unwanted continuations. Specificity is not politeness — it is constraint, and constraint is what makes output predictable.
2. Sufficient context
The model only knows two things: what was in its training data, and what is in this prompt. If the answer depends on your codebase, your policy document, or yesterday's incident, that material must be in the prompt. A frequent failure is asking a question whose answer the model cannot possibly have and then being surprised it invents one.
3. Examples (when the task is fuzzy)
For tasks that are hard to describe but easy to demonstrate — tone, formatting, edge-case handling — one or two worked examples often outperform a paragraph of instruction. This is few-shot prompting, covered in its own entry; the headline is that examples are a lever, not a last resort.
4. An explicit output shape
If downstream code consumes the output, say exactly what it should look like: "Return only a JSON object with keys summary and priority; no prose." Unspecified shape means the model improvises formatting, and improvised formatting breaks parsers.
A before-and-after.
The same task, prompted badly and then well:
BAD "Look at this review and tell me about it. Great product but shipping took 3 weeks and box was crushed." GOOD "Classify the customer review below into exactly one of: positive, negative, mixed. Then extract any complaint as a short phrase. Return JSON: {"sentiment": ..., "complaint": ...} Review: Great product but shipping took 3 weeks and box was crushed."
The bad version produces a paragraph you cannot parse and a sentiment label the model never committed to. The good version names the task, constrains the label set, and fixes the output shape — three of the four levers, applied in four extra lines.
Iterating like an engineer, not a gambler.
The wrong loop is "tweak a word, run once, judge by vibes." Single runs are noisy because the model samples from a distribution (see the LLM mental model). The right loop:
- Collect 10–20 representative inputs, including the edge cases that bite you.
- Change one thing in the prompt at a time.
- Run the whole set; compare aggregate quality, not a single lucky output.
- Keep the change only if it helps on average without regressing edge cases.
Prompts are code. They deserve version control, a small eval set, and changes justified by evidence rather than by which phrasing felt clever.
Order matters. Models weight the start and end of the prompt more heavily than the middle. Put the core instruction near the top, the actual input near the bottom, and restate any non-negotiable constraint ("JSON only") right before the model generates.
"Be detailed and thorough" and "be concise" pull in opposite directions; including both gives the model permission to pick whichever is easier and call it compliance. When two instructions conflict, the model resolves the conflict — and not always your way. State the priority explicitly.
The minimum viable prompt template.
Most production prompts decompose into five labelled parts. Naming them turns "rewrite the prompt" into "which part is weak?"
# 1. ROLE / FRAMING — who the model is acting as You are a support-triage assistant. # 2. TASK — the single instruction, specific Classify each ticket and extract the blocking issue. # 3. CONTEXT — the data + rules it needs Categories: billing, bug, feature-request, other. Policy: anything mentioning "data loss" is priority=high. # 4. EXAMPLES — 1-2 worked cases (optional) Ticket: "App crashed, lost my draft" -> {"cat":"bug","priority":"high"} # 5. OUTPUT FORMAT — exact shape, restated last Return only JSON: {"cat": ..., "priority": ...}
You will not need all five every time, but when a prompt underperforms, walking these slots in order finds the gap faster than rewording the whole thing.
Deliverable
You can now diagnose a weak prompt by lever rather than by feel: is the instruction specific, is the context sufficient, would an example help, is the output shape pinned? You iterate against a small fixed input set instead of single noisy runs, you place critical instructions at the boundaries, and you treat the prompt as versioned code. Every other building block in this section — tool calling, RAG, structured outputs — is ultimately a prompt with extra machinery, so these habits compound.