Prompt Engineering for Developers
Master prompt engineering in 2025: zero-shot, few-shot, chain-of-thought, RAG, and tool use patterns for ChatGPT, Claude, and Gemini.
Master prompt engineering in 2025: zero-shot, few-shot, chain-of-thought, RAG, and tool use patterns for ChatGPT, Claude, and Gemini.
Prompt engineering is the practice of structuring inputs to LLMs to get reliable, high-quality outputs. It’s less about “magic words” and more about giving the model the right context, constraints, and examples — the same way you’d brief a smart contractor. This guide covers patterns that work across the major models (GPT-4o, Claude 3.5+, Gemini 1.5+) as of 2025. No special libraries needed — just the API or chat interface.
# Install the OpenAI SDK (examples use this; patterns apply to any model)
npm install openai
# or
pip install openai
import OpenAI from "openai";
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain async/await in one paragraph." },
],
});
console.log(response.choices[0].message.content);
| Concept | What it means |
|---|---|
| System prompt | Sets the model’s persona, constraints, and output format |
| User/assistant turns | The conversation history the model uses as context |
| Temperature | 0 = deterministic, 1 = creative. Use 0 for structured output, 0.7 for prose |
| Context window | Max tokens the model “sees” — older messages get truncated first |
| Grounding | Giving the model factual context (documents, data) to reason over |
| Tool use / function calling | Letting the model call your code for real-time data or actions |
Works for simple, well-defined tasks. Be specific about format.
Convert this JSON to a markdown table:
{"name": "Alice", "age": 30, "role": "engineer"}
Output only the markdown, no explanation.
Use when the task has a non-obvious format or style.
Classify the sentiment. Output exactly one word: positive, negative, or neutral.
"The deploy went flawlessly." → positive
"The API is down again." → negative
"The PR is under review." → neutral
"Tests are passing but coverage dropped 5%." →
Add “think step by step” or “let’s reason through this” for complex logic. Forces the model to show its work before concluding.
A user reports that their login works on mobile but not desktop.
Think step by step through what could cause this before suggesting a fix.
Combine a persona with explicit constraints to tighten outputs.
You are a senior Go developer doing a code review.
- Flag only bugs and security issues, not style preferences
- Be concise — one sentence per issue
- If the code is fine, say "LGTM"
Review this function:
[paste code]
Ask for JSON explicitly. Most models support response_format: { type: "json_object" } in the API.
const response = await client.chat.completions.create({
model: "gpt-4o",
response_format: { type: "json_object" },
messages: [
{
role: "user",
content: `Extract name, email, and company from: "Hi, I'm Sara ([email protected]) from Acme Corp."
Return JSON with keys: name, email, company.`,
},
],
});
Inject relevant documents into the prompt instead of relying on the model’s training data. Essential for fresh or proprietary information.
Answer the question using only the context below. If the answer isn't in the context, say "I don't know."
Context:
---
[paste your document chunks here]
---
Question: What's the refund policy for annual subscriptions?
async function summariseChunks(chunks) {
const summaries = await Promise.all(
chunks.map((chunk) =>
client.chat.completions.create({
model: "gpt-4o-mini", // cheaper for intermediate steps
messages: [
{ role: "system", content: "Summarise the following text in 3 bullet points." },
{ role: "user", content: chunk },
],
})
)
);
// Final synthesis
const combined = summaries.map((s) => s.choices[0].message.content).join("\n\n");
return client.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: "Synthesise these summaries into one coherent summary." },
{ role: "user", content: combined },
],
});
}
Let the model decide when to call your functions.
const tools = [
{
type: "function",
function: {
name: "get_weather",
description: "Get current weather for a city",
parameters: {
type: "object",
properties: { city: { type: "string" } },
required: ["city"],
},
},
},
];
const response = await client.chat.completions.create({
model: "gpt-4o",
tools,
messages: [{ role: "user", content: "Is it raining in Lisbon?" }],
});
// Check if model wants to call a tool
if (response.choices[0].finish_reason === "tool_calls") {
const call = response.choices[0].message.tool_calls[0];
const args = JSON.parse(call.function.arguments);
const result = await getWeather(args.city); // your implementation
// Send result back in next turn
}
Ask the model to review its own output. Catches ~30-40% of errors without a second model.
[First turn]
Write a regex to validate an email address.
[Second turn]
Review the regex you just wrote. List any edge cases it misses or false positives it would allow.
Temperature 0 ≠ deterministic — it’s close, but not guaranteed. For truly reproducible output, set seed (OpenAI) or use caching.
System prompt injection — if you’re building a product and users can see/influence the system prompt area, attackers will try to override it. Always sanitize user input that gets interpolated into prompts.
Context poisoning — in long conversations, early bad information stays in context. For multi-turn apps, trim or summarise old turns instead of sending the full history.
Model differences matter — Claude handles long documents better than GPT-4o at the same context length. Gemini 1.5 Pro has a 2M token context. Test your prompts on the model you’ll actually use.
Few-shot order matters — the last example before the actual input has the most influence. Put your most representative example last.
Don’t over-engineer — start with a simple zero-shot prompt. Add complexity only when it fails. Most tasks don’t need chain-of-thought.
Prompt caching — Anthropic (Claude) and Google (Gemini) offer prompt caching for repeated system prompts. Can cut costs 80%+ on high-volume apps.
# Claude prompt caching example
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are an expert code reviewer...",
"cache_control": {"type": "ephemeral"}, # cache this block
}
],
messages=[{"role": "user", "content": "Review this PR: ..."}],
)
Source: zero2hero.run/cheatsheets/prompt-engineering-for-developers — Zero to Hero cheatsheets for developers.