intermediate concepts 2025 Edition · Updated April 2026

Prompt Engineering for Developers

Master prompt engineering in 2025: zero-shot, few-shot, chain-of-thought, RAG, and tool use patterns for ChatGPT, Claude, and Gemini.

· 12 min read · AI-reviewed
-->

Prompt Engineering for Developers

Quick Overview

Prompt engineering is the practice of structuring inputs to LLMs to get reliable, high-quality outputs. It’s less about “magic words” and more about giving the model the right context, constraints, and examples — the same way you’d brief a smart contractor. This guide covers patterns that work across the major models (GPT-4o, Claude 3.5+, Gemini 1.5+) as of 2025. No special libraries needed — just the API or chat interface.

Getting Started

# Install the OpenAI SDK (examples use this; patterns apply to any model)
npm install openai
# or
pip install openai
import OpenAI from "openai";
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain async/await in one paragraph." },
  ],
});
console.log(response.choices[0].message.content);

Core Concepts

ConceptWhat it means
System promptSets the model’s persona, constraints, and output format
User/assistant turnsThe conversation history the model uses as context
Temperature0 = deterministic, 1 = creative. Use 0 for structured output, 0.7 for prose
Context windowMax tokens the model “sees” — older messages get truncated first
GroundingGiving the model factual context (documents, data) to reason over
Tool use / function callingLetting the model call your code for real-time data or actions

Essential Patterns

Zero-shot — just ask

Works for simple, well-defined tasks. Be specific about format.

Convert this JSON to a markdown table:
{"name": "Alice", "age": 30, "role": "engineer"}

Output only the markdown, no explanation.

Few-shot — show examples

Use when the task has a non-obvious format or style.

Classify the sentiment. Output exactly one word: positive, negative, or neutral.

"The deploy went flawlessly." → positive
"The API is down again." → negative
"The PR is under review." → neutral

"Tests are passing but coverage dropped 5%." →

Chain-of-thought — reason step by step

Add “think step by step” or “let’s reason through this” for complex logic. Forces the model to show its work before concluding.

A user reports that their login works on mobile but not desktop.
Think step by step through what could cause this before suggesting a fix.

Role + constraints

Combine a persona with explicit constraints to tighten outputs.

You are a senior Go developer doing a code review.
- Flag only bugs and security issues, not style preferences
- Be concise — one sentence per issue
- If the code is fine, say "LGTM"

Review this function:
[paste code]

Structured output

Ask for JSON explicitly. Most models support response_format: { type: "json_object" } in the API.

const response = await client.chat.completions.create({
  model: "gpt-4o",
  response_format: { type: "json_object" },
  messages: [
    {
      role: "user",
      content: `Extract name, email, and company from: "Hi, I'm Sara ([email protected]) from Acme Corp."
Return JSON with keys: name, email, company.`,
    },
  ],
});

RAG — retrieval-augmented generation

Inject relevant documents into the prompt instead of relying on the model’s training data. Essential for fresh or proprietary information.

Answer the question using only the context below. If the answer isn't in the context, say "I don't know."

Context:
---
[paste your document chunks here]
---

Question: What's the refund policy for annual subscriptions?

Common Patterns

Summarise a long document in chunks

async function summariseChunks(chunks) {
  const summaries = await Promise.all(
    chunks.map((chunk) =>
      client.chat.completions.create({
        model: "gpt-4o-mini", // cheaper for intermediate steps
        messages: [
          { role: "system", content: "Summarise the following text in 3 bullet points." },
          { role: "user", content: chunk },
        ],
      })
    )
  );

  // Final synthesis
  const combined = summaries.map((s) => s.choices[0].message.content).join("\n\n");
  return client.chat.completions.create({
    model: "gpt-4o",
    messages: [
      { role: "system", content: "Synthesise these summaries into one coherent summary." },
      { role: "user", content: combined },
    ],
  });
}

Tool use / function calling

Let the model decide when to call your functions.

const tools = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Get current weather for a city",
      parameters: {
        type: "object",
        properties: { city: { type: "string" } },
        required: ["city"],
      },
    },
  },
];

const response = await client.chat.completions.create({
  model: "gpt-4o",
  tools,
  messages: [{ role: "user", content: "Is it raining in Lisbon?" }],
});

// Check if model wants to call a tool
if (response.choices[0].finish_reason === "tool_calls") {
  const call = response.choices[0].message.tool_calls[0];
  const args = JSON.parse(call.function.arguments);
  const result = await getWeather(args.city); // your implementation
  // Send result back in next turn
}

Self-critique loop

Ask the model to review its own output. Catches ~30-40% of errors without a second model.

[First turn]
Write a regex to validate an email address.

[Second turn]
Review the regex you just wrote. List any edge cases it misses or false positives it would allow.

Gotchas & Tips

Temperature 0 ≠ deterministic — it’s close, but not guaranteed. For truly reproducible output, set seed (OpenAI) or use caching.

System prompt injection — if you’re building a product and users can see/influence the system prompt area, attackers will try to override it. Always sanitize user input that gets interpolated into prompts.

Context poisoning — in long conversations, early bad information stays in context. For multi-turn apps, trim or summarise old turns instead of sending the full history.

Model differences matter — Claude handles long documents better than GPT-4o at the same context length. Gemini 1.5 Pro has a 2M token context. Test your prompts on the model you’ll actually use.

Few-shot order matters — the last example before the actual input has the most influence. Put your most representative example last.

Don’t over-engineer — start with a simple zero-shot prompt. Add complexity only when it fails. Most tasks don’t need chain-of-thought.

Prompt caching — Anthropic (Claude) and Google (Gemini) offer prompt caching for repeated system prompts. Can cut costs 80%+ on high-volume apps.

# Claude prompt caching example
import anthropic

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are an expert code reviewer...",
            "cache_control": {"type": "ephemeral"},  # cache this block
        }
    ],
    messages=[{"role": "user", "content": "Review this PR: ..."}],
)

Next Steps


Source: zero2hero.run/cheatsheets/prompt-engineering-for-developers — Zero to Hero cheatsheets for developers.