Table of contents
Chain of Thought Prompting: Techniques and Tips
Chain of Thought (CoT) prompting is a technique that instructs a language model to reason through a problem step by step before producing a final answer. Instead of jumping directly to a conclusion, the model externalises its reasoning process — making errors visible and correctable, and improving accuracy on complex tasks.
Introduced in the 2022 paper Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al.), CoT has since become a foundational prompt engineering primitive.
Standard vs Chain of Thought
In standard prompting, the model maps input directly to output. In CoT prompting, a reasoning trace sits between the input and the output — each intermediate step is part of the generation.
flowchart LR
subgraph Standard["Standard Prompting"]
direction LR
A1["Question"] --> B1["Answer"]
end
subgraph CoT["Chain of Thought Prompting"]
direction LR
A2["Question"] --> B2["Step 1"] --> C2["Step 2"] --> D2["Step 3"] --> E2["Answer"]
end
Standard prompt:
Q: A shop has 48 apples. They sell 3 bags of 6 apples each. How many apples remain?
A:
Model might output 30 — correct, but you cannot verify how it got there.
CoT prompt:
Q: A shop has 48 apples. They sell 3 bags of 6 apples each. How many apples remain?
A: Let's think step by step.
Model output:
Each bag contains 6 apples. 3 bags × 6 apples = 18 apples sold.
48 − 18 = 30 apples remain.
Answer: 30
The reasoning trace makes every step auditable.
Core Techniques
Zero-Shot CoT
Zero-shot CoT requires no examples. You append a trigger phrase that instructs the model to reason before answering.
Common trigger phrases:
Let's think step by step.Walk me through your reasoning.First, let's break this down.Q: A train travels 90 km in 1.5 hours. What is its average speed in km/h?
A: Let's think step by step.
Zero-shot CoT works well with capable models (GPT-4, Claude 3+) on arithmetic, logic, and planning tasks. It requires almost no prompt engineering effort.
Few-Shot CoT
Few-shot CoT provides one or more solved examples that demonstrate the reasoning format. The model learns the expected chain structure from these demonstrations.
Q: A box holds 12 bottles. If you have 5 boxes, how many bottles do you have?
A: Each box holds 12 bottles. 5 boxes × 12 bottles = 60 bottles. Answer: 60.
Q: A cyclist covers 15 km in 30 minutes. How far will they travel in 2 hours?
A: 30 minutes = 0.5 hours. Speed = 15 ÷ 0.5 = 30 km/h. In 2 hours: 30 × 2 = 60 km. Answer: 60 km.
Q: There are 7 rows of seats with 9 seats each. 14 seats are taken. How many are free?
A:
Each example anchors the format — chain length, notation style, and where to place the final answer.
Self-Consistency
A single reasoning chain can be confidently wrong. Self-consistency addresses this by sampling multiple independent chains from the model and taking a majority vote on the final answer.
flowchart TD
Q["Question"] --> C1["Chain 1 → Answer A"]
Q --> C2["Chain 2 → Answer A"]
Q --> C3["Chain 3 → Answer B"]
C1 --> V["Majority Vote"]
C2 --> V
C3 --> V
V --> F["Final Answer: A"]
How to apply it:
0.7) so each sample is different.Self-consistency consistently outperforms single-chain CoT on benchmarks such as GSM8K and MATH. It trades cost (more tokens) for reliability.
import anthropic
from collections import Counter
client = anthropic.Anthropic()
def self_consistent_answer(question: str, samples: int = 5) -> str:
prompt = f"Q: {question}\nA: Let's think step by step."
answers = []
for _ in range(samples):
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
temperature=1,
messages=[{"role": "user", "content": prompt}]
)
text = response.content[0].text
# Extract the last line as the final answer
final = [line for line in text.strip().splitlines() if line][-1]
answers.append(final)
most_common, _ = Counter(answers).most_common(1)[0]
return most_common
Tree of Thoughts
Tree of Thoughts (ToT) extends CoT from a single linear chain into a tree of branching reasoning paths. The model explores multiple continuations at each step and backtracks from dead ends — enabling deliberate, search-like problem solving.
flowchart TD
Root["Problem"] --> T1["Approach A"]
Root --> T2["Approach B"]
Root --> T3["Approach C"]
T1 --> T1a["Step A1 ✓"]
T1 --> T1b["Step A2 ✗ dead end"]
T2 --> T2a["Step B1 ✓"]
T2a --> T2b["Step B2 ✓"]
T2b --> T2c["Solution ✓"]
T3 --> T3a["Step C1 ✗ dead end"]
ToT is suited for tasks with a clear success criterion and a large, structured search space — puzzles, planning problems, code architecture decisions.
Minimal ToT prompt structure:
Imagine three expert reasoners solving this problem collaboratively.
Each proposes their next step. If any expert finds their path is wrong,
they step back and try another direction. All experts discuss until they
reach the best solution.
Problem: [your problem here]
This single-prompt variant ("3 experts") approximates ToT without requiring a custom orchestration loop.
When to Use CoT
Task Type
Use CoT?
Reason
Multi-step math
Yes
Each step depends on the previous result
Logical deduction
Yes
Explicit steps expose hidden assumptions
Simple factual lookup
No
Reasoning chain adds noise without benefit
Code generation
Often
Planning the approach before writing code reduces errors
Creative writing
Rarely
Open-ended output does not benefit from rigid step chains
Debugging
Yes
Tracing through execution step-by-step surfaces root causes
Rule of thumb: if the correct answer requires more than two dependent reasoning steps, CoT will improve accuracy. For single-step tasks, it adds cost without benefit.
Common Pitfalls
Pitfall
Why It Hurts
Fix
Applying CoT to simple tasks
Adds verbosity with no quality gain
Reserve CoT for tasks that have multiple dependent steps
Vague "think step by step" with no structure
Model produces filler reasoning instead of real logic
Provide the first step or an example chain to anchor the model
Trusting a single chain blindly
One reasoning path can be confidently wrong
Use self-consistency: sample multiple chains and majority-vote
Overly long few-shot examples
Fills the context window; later examples get less attention
Keep each demonstration under 5 reasoning steps
Skipping verification of intermediate steps
An early error compounds across all later steps
Ask the model to double-check each step before continuing
Summary
Chain of Thought prompting improves model accuracy on complex tasks by making reasoning explicit and auditable. The three techniques sit at different points on the effort/reliability curve:
quadrantChart
title CoT Techniques — Effort vs Reliability
x-axis Low Effort --> High Effort
y-axis Low Reliability --> High Reliability
Zero-Shot CoT: [0.2, 0.5]
Few-Shot CoT: [0.5, 0.7]
Self-Consistency: [0.7, 0.85]
Tree of Thoughts: [0.9, 0.95]
Let's think step by step. — zero setup, solid baseline.Start with zero-shot CoT. Escalate to few-shot when the task has a specific format. Use self-consistency when correctness matters more than cost. Reach for Tree of Thoughts only when the problem has a well-defined search space.
