Chain of Thought Prompting: Techniques and Tips

Prompting

LLM

Author: Oleh Baranovskyi

Published on May 1, 2026

7 MIN READ

Chain of Thought Prompting: Techniques and Tips
Standard vs Chain of Thought
Core Techniques
- Zero-Shot CoT
- Few-Shot CoT
Self-Consistency
Tree of Thoughts
When to Use CoT
Common Pitfalls
Summary

Chain of Thought (CoT) prompting is a technique that instructs a language model to reason through a problem step by step before producing a final answer. Instead of jumping directly to a conclusion, the model externalises its reasoning process — making errors visible and correctable, and improving accuracy on complex tasks.

Introduced in the 2022 paper Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al.), CoT has since become a foundational prompt engineering primitive.

In standard prompting, the model maps input directly to output. In CoT prompting, a reasoning trace sits between the input and the output — each intermediate step is part of the generation.

flowchart LR
    subgraph Standard["Standard Prompting"]
        direction LR
        A1["Question"] --> B1["Answer"]
    end

    subgraph CoT["Chain of Thought Prompting"]
        direction LR
        A2["Question"] --> B2["Step 1"] --> C2["Step 2"] --> D2["Step 3"] --> E2["Answer"]
    end

Standard prompt:

Q: A shop has 48 apples. They sell 3 bags of 6 apples each. How many apples remain?
A:

Model might output 30 — correct, but you cannot verify how it got there.

CoT prompt:

Q: A shop has 48 apples. They sell 3 bags of 6 apples each. How many apples remain?
A: Let's think step by step.

Model output:

Each bag contains 6 apples. 3 bags × 6 apples = 18 apples sold.
48 − 18 = 30 apples remain.
Answer: 30

The reasoning trace makes every step auditable.

Zero-shot CoT requires no examples. You append a trigger phrase that instructs the model to reason before answering.

Common trigger phrases:

Let's think step by step.

Walk me through your reasoning.

First, let's break this down.

Q: A train travels 90 km in 1.5 hours. What is its average speed in km/h?
A: Let's think step by step.

Zero-shot CoT works well with capable models (GPT-4, Claude 3+) on arithmetic, logic, and planning tasks. It requires almost no prompt engineering effort.

Few-shot CoT provides one or more solved examples that demonstrate the reasoning format. The model learns the expected chain structure from these demonstrations.

Q: A box holds 12 bottles. If you have 5 boxes, how many bottles do you have?
A: Each box holds 12 bottles. 5 boxes × 12 bottles = 60 bottles. Answer: 60.

Q: A cyclist covers 15 km in 30 minutes. How far will they travel in 2 hours?
A: 30 minutes = 0.5 hours. Speed = 15 ÷ 0.5 = 30 km/h. In 2 hours: 30 × 2 = 60 km. Answer: 60 km.

Q: There are 7 rows of seats with 9 seats each. 14 seats are taken. How many are free?
A:

Each example anchors the format — chain length, notation style, and where to place the final answer.

A single reasoning chain can be confidently wrong. Self-consistency addresses this by sampling multiple independent chains from the model and taking a majority vote on the final answer.

flowchart TD
    Q["Question"] --> C1["Chain 1 → Answer A"]
    Q --> C2["Chain 2 → Answer A"]
    Q --> C3["Chain 3 → Answer B"]
    C1 --> V["Majority Vote"]
    C2 --> V
    C3 --> V
    V --> F["Final Answer: A"]

How to apply it:

Set temperature > 0 (e.g., 0.7) so each sample is different.

Run the same prompt 5–10 times.

Extract the final answer from each chain.

Return the most frequent answer.

Self-consistency consistently outperforms single-chain CoT on benchmarks such as GSM8K and MATH. It trades cost (more tokens) for reliability.

import anthropic
from collections import Counter

client = anthropic.Anthropic()

def self_consistent_answer(question: str, samples: int = 5) -> str:
    prompt = f"Q: {question}\nA: Let's think step by step."
    answers = []

    for _ in range(samples):
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=512,
            temperature=1,
            messages=[{"role": "user", "content": prompt}]
        )
        text = response.content[0].text
        # Extract the last line as the final answer
        final = [line for line in text.strip().splitlines() if line][-1]
        answers.append(final)

    most_common, _ = Counter(answers).most_common(1)[0]
    return most_common

Tree of Thoughts (ToT) extends CoT from a single linear chain into a tree of branching reasoning paths. The model explores multiple continuations at each step and backtracks from dead ends — enabling deliberate, search-like problem solving.

flowchart TD
    Root["Problem"] --> T1["Approach A"]
    Root --> T2["Approach B"]
    Root --> T3["Approach C"]

    T1 --> T1a["Step A1 ✓"]
    T1 --> T1b["Step A2 ✗ dead end"]

    T2 --> T2a["Step B1 ✓"]
    T2a --> T2b["Step B2 ✓"]
    T2b --> T2c["Solution ✓"]

    T3 --> T3a["Step C1 ✗ dead end"]

ToT is suited for tasks with a clear success criterion and a large, structured search space — puzzles, planning problems, code architecture decisions.

Minimal ToT prompt structure:

Imagine three expert reasoners solving this problem collaboratively.
Each proposes their next step. If any expert finds their path is wrong,
they step back and try another direction. All experts discuss until they
reach the best solution.

Problem: [your problem here]

This single-prompt variant ("3 experts") approximates ToT without requiring a custom orchestration loop.

Task Type	Use CoT?	Reason
Multi-step math	Yes	Each step depends on the previous result
Logical deduction	Yes	Explicit steps expose hidden assumptions
Simple factual lookup	No	Reasoning chain adds noise without benefit
Code generation	Often	Planning the approach before writing code reduces errors
Creative writing	Rarely	Open-ended output does not benefit from rigid step chains
Debugging	Yes	Tracing through execution step-by-step surfaces root causes

Rule of thumb: if the correct answer requires more than two dependent reasoning steps, CoT will improve accuracy. For single-step tasks, it adds cost without benefit.

Pitfall	Why It Hurts	Fix
Applying CoT to simple tasks	Adds verbosity with no quality gain	Reserve CoT for tasks that have multiple dependent steps
Vague "think step by step" with no structure	Model produces filler reasoning instead of real logic	Provide the first step or an example chain to anchor the model
Trusting a single chain blindly	One reasoning path can be confidently wrong	Use self-consistency: sample multiple chains and majority-vote
Overly long few-shot examples	Fills the context window; later examples get less attention	Keep each demonstration under 5 reasoning steps
Skipping verification of intermediate steps	An early error compounds across all later steps	Ask the model to double-check each step before continuing

Chain of Thought prompting improves model accuracy on complex tasks by making reasoning explicit and auditable. The three techniques sit at different points on the effort/reliability curve:

quadrantChart
    title CoT Techniques — Effort vs Reliability
    x-axis Low Effort --> High Effort
    y-axis Low Reliability --> High Reliability
    Zero-Shot CoT: [0.2, 0.5]
    Few-Shot CoT: [0.5, 0.7]
    Self-Consistency: [0.7, 0.85]
    Tree of Thoughts: [0.9, 0.95]

Zero-shot CoT — append Let's think step by step. — zero setup, solid baseline.

Few-shot CoT — add 2–3 solved examples — higher accuracy on domain-specific tasks.

Self-consistency — sample multiple chains and majority-vote — best accuracy for arithmetic and logic.

Tree of Thoughts — branch and backtrack — best for search-like planning problems.

Start with zero-shot CoT. Escalate to few-shot when the task has a specific format. Use self-consistency when correctness matters more than cost. Reach for Tree of Thoughts only when the problem has a well-defined search space.

Table of contents

Chain of Thought Prompting: Techniques and Tips

Standard vs Chain of Thought

Core Techniques

Zero-Shot CoT

Few-Shot CoT

Self-Consistency

Tree of Thoughts

When to Use CoT

Common Pitfalls

Summary