Leyton Internal Prep Β· BCG X Technical Interview Β· Checklist Β· DSA Β· Tonight

Mock Interview
BCG X

// 5 phases Β· 55 min format Β· full model answers included
πŸ“‹ Session β€” Candidat & Feedback Aucune session
β–Ά
Phase 01 Β· ~8 min

Warm-up β€” Question Bank

4
Mental Math
4
Fermi
4
Logic
Pick 1 from each category per session. Rotate across mock interviews so candidates don't share answers.
πŸ”’ Mental Math 4 questions Β· pick 1
β–Ά
Mental Math Β· M1
Compute 840 Γ· 1200. Show your decomposition step by step β€” no direct answer.
1

Γ·10 both sides: 840/1200 = 84/120

2

Γ·12 both sides: 84/120 = 7/10

3

7/10 = 0.7

Mental Math Β· M2
Compute 1260 Γ· 4200. Decompose step by step.
1

Γ·100 both sides: 1260/4200 = 12.6/42

2

Simpler: divide by 420 β†’ 3/10

3

3/10 = 0.3

Alternative path: 1260/4200 = 126/420 = 63/210 = 21/70 = 3/10 = 0.3. Multiple decomposition paths are valid β€” what matters is showing the work.
Mental Math Β· M3
Estimate √2700. Walk me through your reasoning using known squares.
1

Anchor: 50Β² = 2500, close to 2700.

2

Try 52: 52Β² = (50+2)Β² = 2500 + 200 + 4 = 2704. Off by only 4.

3

√2700 β‰ˆ 52

Memorize key squares: 40Β²=1600, 45Β²=2025, 50Β²=2500, 55Β²=3025, 60Β²=3600.
Mental Math Β· M4
What is 17% of 350? No calculator β€” show your breakdown.
1

10% of 350 = 35

2

5% of 350 = 17.5

3

2% of 350 = 7

4

17% = 10% + 5% + 2% = 35 + 17.5 + 7 = 59.5

Decompose into 10% + 5% + 2% increments. This pattern applies to any percentage question.
🌍 Fermi Estimation 4 questions · pick 1
β–Ά
Fermi Β· F1
How many taxis are there in Casablanca? Walk me through your reasoning.
1

Population: ~4M inhabitants, ~1M households.

2

Demand: ~20% use taxi once/week β†’ 200k trips/week β†’ ~30k trips/day.

3

Supply: 1 taxi does ~12 trips/day β†’ 30k/12 β‰ˆ 2,500 formal taxis. Add informal sector β†’ ~4,000–5,000 total.

Structure: population β†’ demand β†’ capacity β†’ result. Always sanity-check your number at the end.
Fermi Β· F2
How many software developers are there in Morocco?
1

Population: ~37M. Working age (~15–60): ~55% = ~20M people.

2

Workforce with higher education: ~15% = ~3M. STEM graduates: ~20% of that = ~600k.

3

CS/IT specifically: ~25% of STEM = ~150k. Active developers (not all find dev jobs): ~60% = ~90,000–100,000.

Cross-check: Morocco has ~50 universities, ~2,000 IT graduates/year Γ— 20 active years β‰ˆ 40k–80k. Both paths converge around 80–100k β€” good sign.
Fermi Β· F3
How many WhatsApp messages are sent per day in Morocco?
1

Smartphone users: ~37M population, ~70% smartphone penetration = ~26M users.

2

WhatsApp adoption: Morocco is one of the highest globally β€” ~90% of smartphone users = ~23M active users.

3

Messages per user per day: average ~30 messages/day (mix of personal, family groups, professional).

4

23M Γ— 30 = ~700 million messages/day.

Fermi Β· F4
How much data does a Netflix stream generate per hour, and how much total data does Netflix serve globally per day?
1

Per stream: HD = ~3 GB/hour, 4K = ~7 GB/hour. Use ~3 GB as the average (mix of HD/SD).

2

Concurrent streams globally: Netflix has ~260M subscribers. Peak concurrency ~15% = ~40M simultaneous streams.

3

Average watch time: ~2h/subscriber/day. Total: 260M Γ— 2h Γ— 3 GB/h = ~1.5 exabytes/day.

This tests whether the candidate can reason about scale. Netflix actually represents ~15% of global internet traffic β€” a good sanity check anchor.
🧩 Logic Puzzles 4 questions · pick 1
β–Ά
Logic Β· L1
You have 8 identical balls. One is slightly heavier. Using a balance scale, what's the minimum number of weighings to find it?
1

Split 3 / 3 / 2. Weigh the two groups of 3.

2

If one side heavier β†’ heavy ball is in that 3. If balanced β†’ in the group of 2.

3

Second weighing identifies the ball in either case.

Answer: 2 weighings. Key: divide into 3 groups not 2 β€” log₃ thinking.

Logic Β· L2
You have two ropes. Each takes exactly 60 minutes to burn, but they burn unevenly. How do you measure exactly 45 minutes?
1

Light rope A from both ends simultaneously, and light rope B from one end.

2

Rope A finishes in 30 min (burned from both ends). At that moment, light the other end of rope B.

3

Rope B had 30 min left. Burning from both ends now β†’ it finishes in 15 more min. Total: 30 + 15 = 45 minutes.

Classic BCG puzzle. The insight is: lighting both ends of a rope halves its remaining burn time, regardless of unevenness.
Logic Β· L3
A bat and a ball cost 1.10€ total. The bat costs 1€ more than the ball. How much does the ball cost?

The intuitive (wrong) answer is 0.10€. The correct answer is 0.05€.

1

Let ball = x. Bat = x + 1.00.

2

x + (x + 1.00) = 1.10 β†’ 2x = 0.10 β†’ x = 0.05.

3

Ball = 0.05€, Bat = 1.05€. Check: 1.05 - 0.05 = 1.00 βœ“

This is a System 1 vs System 2 thinking test. BCG uses it to see if the candidate pauses to verify their intuition rather than blurting out the first answer.
Logic Β· L4
You're in a room with 3 light switches. One controls a bulb in a closed room. You can only enter the room once. How do you identify which switch controls the bulb?
1

Turn on switch 1 and wait 5 minutes.

2

Turn switch 1 OFF, turn switch 2 ON.

3

Enter the room: if the bulb is on β†’ switch 2. If off but warm β†’ switch 1. If off and cold β†’ switch 3.

Exploits a physical property (heat) beyond just on/off. BCG values candidates who use all available information β€” not just the obvious binary signals.
Phase 02 Β· ~10 min

Fundamentals β€” Question Bank

🐳
DevOps / Git
🧱
Data Structures
πŸ€–
AI / LLM
βš™οΈ
Principles
Pick 2–3 questions across different categories per session. Mix categories based on the candidate's CV profile.
🐳 DevOps · Git · Docker 4 questions
β–Ά
DevOps Β· D1
What is the difference between a Docker Image and a Docker Container?

Image β€” static, read-only blueprint built at compile time. Like a class in OOP. Immutable, layered (OS + deps + app code).

Container β€” a running instance of an image. Like an object instantiated from a class. Has its own writable layer, process space, and network.

  • Multiple containers can run from the same image simultaneously
  • Analogy: Image = recipe, Container = dish being cooked
  • Containers are ephemeral β€” data is lost on stop unless mounted to a volume
DevOps Β· D2
Explain CI/CD. What's the difference between Continuous Delivery and Continuous Deployment?

CI (Continuous Integration): every push triggers automated tests + build. Catches integration issues early. Tools: GitHub Actions, Jenkins, CircleCI.

  • Continuous Delivery: build is always deployable, but a human approves the push to prod
  • Continuous Deployment: every green build auto-deploys to prod β€” no human gate
BCG context: a solid CI/CD pipeline reduces deployment risk and enables the rapid iteration speed BCG X expects.
DevOps Β· D3
What is the difference between git merge and git rebase? When would you use each?

git merge creates a merge commit that joins two branch histories. Original history preserved.

git rebase replays your commits on top of the target branch, rewriting history to appear linear.

mergerebase
HistoryPreserves divergenceLinear / cleaner
CommitsAdds a merge commitRewrites commits (new SHAs)
Use whenTeam branches, PRsLocal cleanup before push
Never rebase a branch already pushed and shared β€” you rewrite SHAs and break other devs' local copies.
DevOps Β· D4
What does git reset --hard do vs git revert? Which is safe on a shared branch?

git reset --hard HEAD~1 β€” moves the branch pointer back, discards the commit. Destructive. Rewrites history.

git revert <SHA> β€” creates a new commit that undoes a previous commit. Non-destructive. History preserved.

  • reset --hard β†’ safe only on local, unpublished commits
  • revert β†’ always safe, including on shared/main branches
  • In production: always use revert β€” it's auditable and reversible
🧱 Data Structures 4 questions
β–Ά
DS Β· S1
What is the difference between a List, a Set, and a Dictionary in Python? When would you choose each?
StructureOrderedDuplicatesLookupUse case
Listβœ…βœ…O(n)Ordered sequence, indexed access
Set❌❌O(1)Membership test, deduplication
Dictβœ… (3.7+)Keys: ❌O(1)Keyβ†’value mapping, frequency count
# List β€” ordered, indexable, allows duplicates
temps = [3, -5, 3, 0]

# Set β€” fast membership, no order, no dupes
visited = {"nodeA", "nodeB"}

# Dict — key→value, O(1) lookup
freq = {"python": 3, "data": 2}
DS Β· S2
What is the difference between a HashMap and a HashSet? How are they implemented internally?

HashMap stores key β†’ value pairs. HashSet stores only keys β€” internally it's a HashMap where the value is a dummy constant.

HashMapHashSet
Storeskey + valuekey only
LookupO(1) avgO(1) avg
Use whenCount, cache, indexVisited, dedup, membership

Both use a hash table. Hash function β†’ bucket index. Collisions: chaining (linked list per bucket) or open addressing (Python uses open addressing with pseudo-random probing).

Dict lookup is O(1) average, O(n) worst case with hash collisions β€” but negligible in practice with Python's built-in types.
DS Β· S3
What is the difference between a Stack and a Queue? Implement a Stack using only two Queues.

Stack β€” LIFO. push/pop from same end. Use: undo/redo, call stack, bracket validation.

Queue β€” FIFO. enqueue at back, dequeue from front. Use: BFS, task scheduling, Kafka.

from collections import deque

class StackFromQueues:
    def __init__(self): self.q = deque()

    def push(self, x):
        self.q.append(x)
        for _ in range(len(self.q) - 1):
            self.q.append(self.q.popleft())  # rotate newest to front

    def pop(self): return self.q.popleft()  # O(1)
# push O(n), pop O(1)
DS Β· S4
What is the difference between an Array and a Linked List? When is each preferred?
OperationArrayLinked List
Index accessO(1)O(n)
SearchO(n)O(n)
Insert at headO(n) shiftO(1)
Insert at tailO(1) amortizedO(n) / O(1) with tail ptr
MemoryContiguous, cache-friendlyScattered, pointer overhead
Python's list is a dynamic array, not a linked list. Use collections.deque for O(1) head insertions.
πŸ€– AI Β· LLM Β· LangChain Β· RAG Β· Agents 5 questions
β–Ά
AI Β· A1
What is LangChain and what problem does it solve vs calling the OpenAI API directly?

LangChain is an orchestration framework for LLM-powered apps. Direct API = single prompt β†’ single response. LangChain adds: Chains (multi-step pipelines), Agents (LLM-driven tool selection), Memory (conversation history), Tools (external integrations), Prompt templates.

Analogy: direct API = raw SQL. LangChain = ORM. Higher abstraction, faster to build, less fine-grained control.
AI Β· A2
Explain the ReAct pattern. What does the loop look like and why is it powerful?

ReAct (Reasoning + Acting): the LLM alternates Thought β†’ Action β†’ Observation until it can produce a Final Answer.

1

Thought: "I need to find the population of Casablanca"

2

Action: search("Casablanca population 2024")

3

Observation: "4.27 million"

4

Final Answer: generated from accumulated observations

  • Loop stops at max_iterations to prevent infinite loops
  • Agent chooses tools based on their text descriptions β€” no hardcoded flow
  • Powerful because it's dynamic: the LLM decides what info it needs at runtime
AI Β· A3
What is RAG? Describe the pipeline and name 3 limitations.

RAG retrieves relevant docs at query time and injects them into the LLM prompt β€” solving the knowledge cutoff problem.

1

Index (offline): chunk β†’ embed β†’ store in vector DB (FAISS, Pinecone, Chroma)

2

Retrieve (online): embed query β†’ cosine similarity β†’ top-k chunks

3

Generate: LLM answers using retrieved chunks as context

  • Chunk boundary splits lose cross-sentence context
  • Wrong retrieval β†’ hallucination (garbage in, garbage out)
  • Context window limit caps how many chunks can be injected
  • Index staleness when source documents update
"RAG replaces BM25 sparse retrieval with dense vector similarity β€” same architecture, different distance metric."
AI Β· A4
What is the difference between an LLM Agent and a simple LLM Chain? Give a concrete production example.
ChainAgent
FlowFixed at code timeLLM decides at runtime
LatencyLowerHigher (multi-step)
ReliabilityHighVariable
Use whenKnown, repeatable taskOpen-ended, multi-hop reasoning

Example: data analyst agent β€” "Why did revenue drop in Q3?" β†’ agent decides to: query DB, run correlation, search external news, then synthesize. A chain can't do this because the steps are unknown in advance.

AI Β· A5
What is TF-IDF, how does BM25 improve it, and why does log(1) = 0 matter in the formula?

TF-IDF: TF(t,d) Γ— log(N / (1 + df(t))). Scores term importance by frequency Γ— rarity.

BM25 adds: (1) TF saturation β€” diminishing returns on repeated terms. (2) Length normalization β€” penalizes long documents.

  • log(1) = 0 β†’ term in every doc gets IDF = 0. Correct: it's a stopword ("the", "is")
  • log(0) = βˆ’βˆž β†’ +1 in denominator prevents this when df = 0
BCG often shows the BM25 formula and asks candidates to spot edge cases β€” the +1 guard is one of them.
βš™οΈ SOLID Β· ACID Β· REST vs GraphQL Β· SQL vs NoSQL 4 questions
β–Ά
Principles Β· P1
What are the SOLID principles? One sentence each + one concrete code example.
LetterPrincipleOne-liner
SSingle ResponsibilityA class has only one reason to change
OOpen/ClosedOpen for extension, closed for modification
LLiskov SubstitutionSubclasses must be usable wherever the parent is used
IInterface SegregationDon't force classes to implement interfaces they don't use
DDependency InversionDepend on abstractions, not concrete implementations
# SRP violation β€” one class doing too much
class Report:
    def generate(self): ...
    def save_to_db(self): ...  # ← different responsibility
    def send_email(self): ...  # ← different responsibility

# SRP fix β€” one class per responsibility
class ReportGenerator:  def generate(self): ...
class ReportRepository: def save(self, r): ...
class ReportNotifier:   def send(self, r): ...
Principles Β· P2
What are the ACID properties? Give a real-world violation example for each.
PropertyMeaningViolation example
AtomicityAll or nothingBank transfer: debit succeeds, credit crashes β†’ money disappears
ConsistencyDB stays in valid stateOrder references a deleted customer β†’ orphaned FK record
IsolationConcurrent txns don't interfereDirty read: user sees another's uncommitted changes
DurabilityCommitted data survives crashesPower cut after commit β€” data gone if not persisted to disk
ACID = SQL databases (PostgreSQL, MySQL). NoSQL often trades ACID for availability + partition tolerance (CAP theorem).
Principles Β· P3
What is the difference between SQL and NoSQL? When would you choose each in a BCG X context?
SQLNoSQL
SchemaFixed, enforcedFlexible / schema-less
ScalingVerticalHorizontal
TransactionsFull ACIDEventual consistency
ExamplesPostgreSQL, MySQLMongoDB, Redis, Cassandra
  • SQL: financial data, structured reporting, complex joins, compliance
  • NoSQL: high-volume writes, flexible schemas, caching (Redis), real-time feeds, vector DBs
BCG X example: ESG platform uses PostgreSQL for structured company data, Redis to cache API responses, and FAISS (vector DB) for the RAG layer.
Principles Β· P4
What is the difference between REST and GraphQL? What problem does GraphQL solve?

REST β€” resource-based endpoints returning fixed shapes. Problems: over-fetching (too many fields) and under-fetching (need multiple requests).

GraphQL β€” single endpoint, client defines exactly which fields it needs.

RESTGraphQL
EndpointsMany (/users, /posts…)One (/graphql)
Over-fetchingCommonEliminated
Type systemOptional (OpenAPI)Built-in schema
CachingEasy (HTTP)Harder (POST requests)
Best forSimple CRUD APIsComplex nested data, mobile
Phase 03 Β· ~20 min

Live Coding β€” Problem Solving

14
Exercises
Pick 1
Per Candidate
20
Minutes
Always follow this sequence: Clarify assumptions β†’ Brute force β†’ Optimize β†’ Complexity β†’ Edge cases β†’ Narrate. Magic phrase: "Let me first start with a straightforward approach, then I'll optimize it."
Problem A Β· Top K Frequent Elements
Medium
Given an array of integers nums and an integer k, return the k most frequent elements. The result must be sorted in descending order of frequency. If two elements have the same frequency, sort them in ascending order of value.

Example: nums = [1,1,1,2,2,3], k = 2 β†’ [1, 2]
Example: nums = [4,4,2,2,3], k = 2 β†’ [2, 4]  (tie in freq β†’ ascending value)

Step 1 β€” Clarify Assumptions

  • Is k always valid? (1 ≀ k ≀ number of unique elements) β†’ assume yes
  • Can there be negative numbers? β†’ yes, hashmap handles it fine
  • What if multiple elements tie in frequency? β†’ sort them ascending by value (good edge case to raise proactively)
  • Should the output preserve original order? β†’ no, sort by frequency descending

Step 2 β€” Brute Force

Count frequencies naively with a loop, then sort all unique elements by frequency, take the first k.

def top_k_brute(nums, k):
    freq = {}
    for n in nums:
        freq[n] = freq.get(n, 0) + 1  # build manually

    # sort all unique elements by freq descending
    sorted_keys = sorted(freq.keys(),
                          key=lambda x: -freq[x])
    return sorted_keys[:k]
# Time: O(n + u log u)  u = unique elements  Space: O(u)

Step 3 β€” Optimize: HashMap + Sorted with Tuple Key

Use collections.Counter to build the frequency map in O(n), then sort with a tuple key to handle ties β€” (-freq, value) sorts by frequency descending, then value ascending.

from collections import Counter

def top_k_frequent(nums: list[int], k: int) -> list[int]:
    # Step 1: build frequency map β€” O(n)
    freq = Counter(nums)
    # freq = {1: 3, 2: 2, 3: 1} for [1,1,1,2,2,3]

    # Step 2: sort by (-freq, value) β€” handles ties cleanly
    sorted_elements = sorted(freq.keys(),
                              key=lambda x: (-freq[x], x))

    # Step 3: return top k
    return sorted_elements[:k]
# Time: O(n + u log u)  Space: O(u)

Step 4 β€” Dry Run

nums = [4,4,2,2,3], k = 2

freq = {4: 2, 2: 2, 3: 1}

sort key (-freq, value):
  4 β†’ (-2, 4)
  2 β†’ (-2, 2)
  3 β†’ (-1, 3)

sorted: [2, 4, 3]   ← 2 before 4 because (-2,2) < (-2,4)

return [:2] β†’ [2, 4]  βœ“  (tie broken by ascending value)

Step 5 β€” Complexity

ApproachTimeSpaceNotes
Brute force (manual loop)O(n + u log u)O(u)Same asymptotically but no tie handling
Counter + sorted (optimal)O(n + u log u)O(u)Clean, handles ties, Pythonic
Heap (most_common)O(n + u log k)O(u)Faster when k << u

u = number of unique elements. In the worst case u = n (all elements distinct).

Step 6 β€” Edge Cases

  • k = 1 β†’ returns the single most frequent element βœ“
  • All elements identical β†’ freq = {x: n}, k = 1 β†’ [x] βœ“
  • All elements unique β†’ every freq = 1, sort ascending by value, return first k βœ“
  • Negative numbers β†’ hashmap handles, tuple sort still works βœ“

Interviewer Follow-up Questions

Follow-up 1
You used sorted() which is O(u log u). Can you do better when k is much smaller than u?

Yes β€” use a min-heap of size k. Push elements one by one; if the heap exceeds k, pop the minimum. Final heap contains the top k.

import heapq

def top_k_heap(nums, k):
    freq = Counter(nums)
    # heap of (freq, value) β€” min-heap on freq
    heap = []
    for val, cnt in freq.items():
        heapq.heappush(heap, (cnt, val))
        if len(heap) > k:
            heapq.heappop(heap)  # remove least frequent
    return [val for cnt, val in
            sorted(heap, reverse=True)]
# Time: O(n + u log k)  β€” better when k << u
MethodTimeBest when
sorted()O(u log u)k close to u
Min-heapO(u log k)k << u
Follow-up 2
How would you sort the result in ascending order of frequency instead (least frequent first)?

Simply remove the minus sign from the sort key β€” (freq[x], x) instead of (-freq[x], x).

# Ascending frequency, ascending value on tie
sorted_elements = sorted(freq.keys(),
                          key=lambda x: (freq[x], x))

# Descending frequency, descending value on tie
sorted_elements = sorted(freq.keys(),
                          key=lambda x: (-freq[x], -x))
Tuple sort keys are the cleanest way to handle multi-level sorting in Python. Master this pattern β€” it comes up constantly.
Follow-up 3
What does Counter actually do internally? What's the complexity of building it?

Counter is a subclass of dict. It iterates through the list once and increments a count for each element β€” exactly equivalent to a manual hashmap loop.

  • Time: O(n) β€” one pass through the input
  • Space: O(u) β€” one entry per unique element
  • Each increment is O(1) average (hash table access)
Saying "Counter is O(n) because it's a single-pass hashmap" signals that you understand what's happening under the hood β€” exactly what BCG probes.
Pick an exercise for this candidate

Use the button below to reveal a different exercise. Rotate across mock sessions so candidates don't share answers.

Phase 04 Β· ~7 min

Complexity Deep-Dive

5
Follow-up Qs
Big-O
Focus
7
Minutes
Complexity β€” Q1
Is O(n log n) bigger or smaller than O(n)? Explain with an example.

O(n log n) is bigger β€” it grows faster than O(n).

For n = 1,000,000: O(n) = 1,000,000 operations. O(n log n) β‰ˆ 20,000,000 operations (logβ‚‚(10⁢) β‰ˆ 20).

  • So n log n is roughly 20x slower for large inputs in this case
  • However, O(n log n) is still very efficient β€” most sorting algorithms (merge sort, heap sort) are O(n log n)
  • Hierarchy: O(1) < O(log n) < O(n) < O(n log n) < O(nΒ²) < O(2ⁿ)
Complexity β€” Q2
Can you solve the anagram problem in O(n)? If yes, how? If no, why not?

Yes, using a frequency hashmap (dictionary).

def is_anagram(s1: str, s2: str) -> bool:
    if len(s1) != len(s2):
        return False

    count = {}
    for ch in s1:
        count[ch] = count.get(ch, 0) + 1  # build freq map
    for ch in s2:
        count[ch] = count.get(ch, 0) - 1  # decrement
        if count[ch] < 0:
            return False

    return True
# Time: O(n)  Space: O(1) β€” bounded by 26 letters
ApproachTimeWhy
sorted() comparisonO(n log n)Sorting is O(n log n)
Frequency hashmapO(n)Two linear passes, O(1) dict access
Key statement: "Dictionary lookup is O(1) on average β€” that's what enables us to go from O(n log n) to O(n)."
Complexity β€” Q3
What is the time complexity of binary search and why? When can you use it?

O(log n) β€” because each step halves the search space.

After k steps, remaining elements = n / 2^k. When n / 2^k = 1, we've found the answer β†’ k = logβ‚‚(n).

  • Prerequisite: the array must be sorted
  • For n = 1,000,000: binary search takes ~20 comparisons vs 1,000,000 for linear search
  • Use cases: sorted arrays, rotated sorted arrays, "find first/last occurrence", search in answer space (binary search on the answer)
def binary_search(nums, target):
    left, right = 0, len(nums) - 1
    while left <= right:
        mid = (left + right) // 2
        if nums[mid] == target: return mid
        elif nums[mid] < target: left = mid + 1
        else: right = mid - 1
    return -1
Complexity β€” Q4
You have a solution that runs in O(nΒ²). The interviewer asks you to optimize it. What patterns should you think of first?

Framework for optimization from O(nΒ²):

1

HashMap/HashSet β†’ eliminate inner loop by looking up complements in O(1). Two Sum, Anagram, duplicates detection.

2

Sliding Window β†’ when the problem involves a contiguous subarray/substring. Moves a window rather than re-scanning.

3

Two Pointers β†’ when working on a sorted array. Start from both ends and converge.

4

Sort first β†’ O(n log n) pre-processing sometimes unlocks an O(n) second pass. Better than O(nΒ²) overall.

5

Prefix sum / Memoization β†’ avoid recomputing sub-results. Dynamic programming territory.

BCG wants to hear you think in patterns β€” naming the pattern before coding signals strong algorithmic thinking.
Complexity β€” Q5
What is the space complexity of a recursive DFS on a graph with n nodes and e edges?

Space complexity = O(n) for the call stack in the worst case.

  • In the worst case (a path graph), DFS goes n levels deep before backtracking β†’ call stack depth = n
  • We also need O(n) for the visited set to avoid cycles
  • Total space: O(n)
  • Time complexity for DFS/BFS on a graph = O(n + e) β€” we visit each node once and each edge once
For very deep recursion on large graphs, Python's default recursion limit (~1000) can cause stack overflow β†’ prefer iterative DFS with an explicit stack in production.

Cheat Sheet β€” Patterns to Master
PatternTypical ComplexityKey Problems
HashMapO(n)Two Sum, Anagram, Group Anagrams
Sliding WindowO(n)Longest Substring, Max Subarray
StackO(n)Valid Parentheses, Next Greater Element
BFS/DFSO(n+e)Number of Islands, Shortest Path
Binary SearchO(log n)Sorted array search, rotated array
Heap / Priority QueueO(n log k)Top K Elements, K Closest Points
SortingO(n log n)Merge Intervals, Meeting Rooms
Two PointersO(n)Container With Most Water, 3Sum
Phase 05 Β· ~15 min

Scientific Paper β€” Read, Understand, Implement

5
Minutes Read
3
Questions
Impl.
From Scratch
The interviewer hands the candidate this paper extract. They have 5 minutes to read it silently, then questions follow. No Googling. No notes. Just comprehension, reasoning, and implementation.
πŸ“„ Paper Extract β€” Read for 5 minutes

TF-IDF & BM25 β€” Excerpt

TF-IDF (Term Frequency–Inverse Document Frequency) is a numerical statistic that reflects how important a word is to a document in a corpus.

TF(t, d) = count(t in d) / total_terms(d)
IDF(t, D) = log( N / (1 + df(t)) )
TF-IDF(t, d, D) = TF(t, d) Γ— IDF(t, D)

Where N is the total number of documents, and df(t) is the number of documents containing term t. The +1 in the denominator prevents division by zero for terms appearing in all documents.

BM25 (Best Match 25) improves on TF-IDF with two key mechanisms: term frequency saturation and document length normalization.

score(d, q) = Ξ£ IDF(qα΅’) Γ— [ tf(qα΅’,d) Γ— (k1+1) ] / [ tf(qα΅’,d) + k1 Γ— (1 - b + b Γ— |d|/avgdl) ]

IDF(qα΅’) = log( (N - df(qα΅’) + 0.5) / (df(qα΅’) + 0.5) + 1 )

k1 ∈ [1.2, 2.0] β€” controls TF saturation
b ∈ [0, 1] β€” controls length normalization (typically 0.75)
avgdl β€” average document length in the corpus

As tf grows, the numerator saturates: doubling term frequency no longer doubles the score. When b = 0, length normalization is disabled. When b = 1, full normalization is applied.

Paper Q1 β€” Comprehension
In TF-IDF, why do we apply a logarithm to the IDF component? What happens without it?

The log dampens the effect of rare terms. Without it, a term appearing in only 1 document out of 1,000,000 would get an IDF of 1,000,000 β€” completely dominating the score for any document that contains it, even once.

  • With log: IDF = log(1,000,000) β‰ˆ 14 β€” significant but not overwhelming
  • Without log: IDF = 1,000,000 β€” absurdly large, kills all balance
  • The log also makes IDF = 0 for terms appearing in every document (log(N/N) = log(1) = 0) β€” which is the correct behavior for stopwords like "the", "is"
Key insight to say out loud: "The log compresses the range so that rare terms are weighted meaningfully but not explosively."
Paper Q2 β€” Comparison
What specific problem does BM25 solve that TF-IDF does not? Explain term frequency saturation and length normalization in your own words.

Problem 1 β€” TF saturation: in TF-IDF, if a term appears 100 times instead of 10, the score is 10Γ— higher. But intuitively, a document mentioning "python" 100 times isn't 10Γ— more relevant than one mentioning it 10 times. BM25 applies a saturation curve β€” score grows fast at first then plateaus as tf increases. Controlled by k1.

Problem 2 β€” Length normalization: a long document will naturally contain more term occurrences just because it's longer. TF-IDF doesn't account for this. BM25 penalizes long documents via the |d|/avgdl ratio, controlled by b. A term in a 50-word document is more meaningful than the same term in a 5,000-word document.

  • k1 = 0 β†’ BM25 becomes binary (term present/absent only)
  • b = 0 β†’ no length normalization (same as raw TF behavior)
  • b = 1 β†’ full normalization (divides TF by document length)
  • Default k1=1.5, b=0.75 works well empirically across most IR tasks
Paper Q3 β€” Implementation
Implement TF-IDF scoring from scratch in Python. Given a corpus of documents and a query, return the top-3 most relevant documents ranked by their TF-IDF score. Then propose one concrete optimization.

Step 1 β€” TF-IDF Implementation

import math
from collections import Counter

def compute_tf(doc: list[str]) -> dict:
    """Term frequency: count(t) / total_terms"""
    counts = Counter(doc)
    total = len(doc)
    return {term: cnt / total for term, cnt in counts.items()}

def compute_idf(corpus: list[list[str]]) -> dict:
    """IDF: log(N / (1 + df(t))) for each term"""
    N = len(corpus)
    df = {}
    for doc in corpus:
        for term in set(doc):  # set β†’ count doc once per term
            df[term] = df.get(term, 0) + 1
    return {term: math.log(N / (1 + df[term])) for term in df}

def tfidf_score(query: list[str], doc: list[str],
               idf: dict) -> float:
    """Score a single doc against a query"""
    tf = compute_tf(doc)
    return sum(tf.get(t, 0) * idf.get(t, 0) for t in query)

def top_k_tfidf(query_str: str, corpus_strs: list[str],
               k: int = 3) -> list[tuple]:
    corpus = [doc.lower().split() for doc in corpus_strs]
    query  = query_str.lower().split()
    idf    = compute_idf(corpus)

    scores = [(i, tfidf_score(query, doc, idf))
              for i, doc in enumerate(corpus)]
    scores.sort(key=lambda x: -x[1])  # descending score
    return scores[:k]

# --- Demo ---
corpus = [
    "python is great for data science",
    "data science uses python and R",
    "java is used for backend development",
    "python python python everywhere"
]
print(top_k_tfidf("python data", corpus))
# β†’ [(1, 0.18), (0, 0.15), (3, 0.12)]  (approx)

Step 2 β€” Complexity

StepTimeSpace
Build IDF (offline)O(N Γ— L)O(V)
Score one docO(|query|)O(|doc|)
Score all docs + sortO(N Γ— L + N log N)O(N)

N = corpus size, L = avg doc length, V = vocabulary size.

Step 3 β€” Proposed Optimizations

  • Inverted index: instead of scanning all documents per query term, precompute a mapping term β†’ [doc_ids]. Query only hits documents that contain at least one query term β€” reduces N dramatically for large corpora.
  • Pre-normalize TF vectors: compute and cache TF at index time, not at query time. Score becomes a dot product lookup.
  • Sparse representation: store TF-IDF vectors as dicts (only non-zero terms) instead of dense arrays β€” memory O(V) β†’ O(L) per doc.
  • Switch to BM25: replace the TF component with the saturation formula β€” typically 5–15% better NDCG on standard IR benchmarks.
The interviewer will probe: "Why did you use set(doc) in compute_idf?" β†’ Answer: to count each document only once per term, regardless of how many times the term appears in that document. df(t) = number of documents containing t, not total occurrences.
Bonus β€” If time allows
Extend your implementation to BM25. Where exactly does the code change, and what new parameters do you need?

Only the scoring function changes β€” IDF computation and corpus indexing stay the same.

def bm25_score(query: list[str], doc: list[str],
              idf: dict, avgdl: float,
              k1: float = 1.5, b: float = 0.75) -> float:
    counts = Counter(doc)
    dl = len(doc)
    score = 0.0
    for term in query:
        tf = counts.get(term, 0)
        idf_val = idf.get(term, 0)
        # BM25 saturation + length norm
        numerator   = tf * (k1 + 1)
        denominator = tf + k1 * (1 - b + b * dl / avgdl)
        score += idf_val * (numerator / denominator)
    return score

# avgdl computed once at index time:
avgdl = sum(len(doc) for doc in corpus) / len(corpus)
  • 2 new hyperparameters: k1 (saturation), b (length norm)
  • 1 new corpus stat: avgdl (computed once, O(NΓ—L))
  • The IDF formula also changes slightly β€” uses (N - df + 0.5) / (df + 0.5) for better probabilistic calibration
Key phrase to say: "The only structural change is in the TF numerator/denominator β€” everything else (inverted index, IDF caching, top-k selection) remains identical."