Ch 2: Data Structures & Algorithms - Introduction¶
Track: Foundation | Try code in Playground | Back to chapter overview
Read online or run locally
You can read this content here on the web. To run the code interactively, either use the Playground or clone the repo and open chapters/chapter-02-data-structures/notebooks/01_introduction.ipynb in Jupyter.
Chapter 2: Data Structures & Algorithms¶
Notebook 01 - Introduction¶
Understanding data structures and algorithms is fundamental to writing efficient AI code. This notebook covers arrays, complexity analysis, and searching.
What you'll learn: - Why data structures matter for AI - Arrays and dynamic arrays (Python lists) - Big O notation and complexity analysis - Linear search and binary search - Two-pointer and sliding window techniques
Time estimate: 2 hours
Generated by Berta AI | Created by Luigi Pascal Rondanini
1. Why Data Structures Matter for AI¶
Why data structures matter: In real ML pipelines, you might process millions of training examples, look up vocabulary in embedding tables thousands of times per second, or manage queues of inference requests. Choosing the wrong structure—like using a list when you need fast lookups—can turn a 10-minute job into a 10-hour job. The right choice is invisible when things work; the wrong choice shows up as "why is my training so slow?" at 2am. Think of data structures as the foundation: get it right early, and everything built on top runs smoothly.
In AI/ML work, the choice of data structure directly impacts: - How fast your data pipeline runs - How much memory your models need - Whether your training loop finishes in hours or days
| Operation | List | Dict | Set |
|---|---|---|---|
| Lookup by index | O(1) | - | - |
| Lookup by value | O(n) | O(1) | O(1) |
| Insert at end | O(1)* | O(1)* | O(1)* |
| Insert at start | O(n) | - | - |
| Delete by value | O(n) | O(1) | O(1) |
* amortized
Let's prove it with a benchmark. The code below times how long it takes to look up an item in a list versus a dict versus a set. We're measuring lookup by value—"is this thing in my collection?"
import time
def benchmark(func, *args, runs=1000):
"""Time a function over multiple runs."""
start = time.perf_counter()
for _ in range(runs):
func(*args)
elapsed = (time.perf_counter() - start) / runs
return elapsed
# List vs Dict vs Set: lookup performance
n = 10_000
data_list = list(range(n))
data_dict = {i: True for i in range(n)}
data_set = set(range(n))
target = n - 1 # worst case for list
list_time = benchmark(lambda: target in data_list, runs=100)
dict_time = benchmark(lambda: target in data_dict, runs=100)
set_time = benchmark(lambda: target in data_set, runs=100)
print(f"Looking up element {target} in {n:,} items:")
print(f" List: {list_time*1e6:>10.1f} us")
print(f" Dict: {dict_time*1e6:>10.1f} us")
print(f" Set: {set_time*1e6:>10.1f} us")
print(f"\nDict is ~{list_time/max(dict_time,1e-10):.0f}x faster than list for lookup!")
What just happened¶
The benchmark ran the same lookup operation 100 times and averaged the result. The numbers show microseconds per lookup (1 us = 0.000001 seconds). A list must scan from the start until it finds the item—or reaches the end—so with 10,000 items and a target at the end, it checks all 10,000. A dict and set use hashing: they jump directly to the right "bucket," so lookup time is nearly constant no matter how many items. That's why dict is tens or hundreds of times faster for this operation. In ML, vocab lookups, feature hashing, and deduplication all rely on O(1) structures—this speedup scales with your data size.
2. Big O Notation¶
Big O is like speed ratings for algorithms. Imagine a restaurant: O(1) is like already knowing your order—you don't need to read the menu. O(n) is reading the whole menu, one item at a time—more items means more time. O(n²) is comparing every dish to every other dish—that gets slow fast. Big O tells you how the worst case scales: if you double the data, does your algorithm take twice as long (O(n)), four times as long (O(n²)), or the same (O(1))? For AI, this matters when your dataset grows from 1K to 1M samples.
Big O describes how an algorithm's time or space grows as input size increases. It's the language engineers use to discuss efficiency.
| Big O | Name | Example | 1K items | 1M items |
|---|---|---|---|---|
| O(1) | Constant | Dict lookup | 1 op | 1 op |
| O(log n) | Logarithmic | Binary search | 10 ops | 20 ops |
| O(n) | Linear | Linear search | 1K ops | 1M ops |
| O(n log n) | Linearithmic | Merge sort | 10K ops | 20M ops |
| O(n²) | Quadratic | Bubble sort | 1M ops | 1T ops |
| O(2ⁿ) | Exponential | Brute-force | Heat death | Heat death |
import math
def demonstrate_complexity():
"""Show how different complexities scale."""
sizes = [10, 100, 1_000, 10_000, 100_000]
print(f"{'n':>10} {'O(1)':>10} {'O(log n)':>10} {'O(n)':>10} {'O(n log n)':>12} {'O(n²)':>14}")
print("-" * 70)
for n in sizes:
o1 = 1
olog = math.ceil(math.log2(n))
on = n
onlogn = n * math.ceil(math.log2(n))
on2 = n * n
print(f"{n:>10,} {o1:>10,} {olog:>10,} {on:>10,} {onlogn:>12,} {on2:>14,}")
demonstrate_complexity()
What just happened¶
The table shows how many "operations" each complexity class requires as n grows. Notice O(n²) explodes: 100,000 items means 10 billion operations. O(n log n) stays manageable—merge sort on a million items is around 20 million ops. This is why we care: a poorly chosen algorithm on big data can mean the difference between seconds and days.
3. Searching Algorithms¶
Searching is one of the most fundamental operations. In AI you search through datasets, model parameters, hyperparameter spaces, and more.
Linear search (plain English): Imagine looking for your keys in your apartment. You check the kitchen, then the bedroom, then the couch—one place at a time until you find them. Linear search does exactly that: start at the beginning, check each element in order. It always works but can be slow if the item is at the end.
Binary search (plain English): Now imagine your keys could only be in drawers numbered 1 to 100, and the drawers are sorted. You open drawer 50: too high. So it must be in 1–49. Open 25: too low. So 26–49. You keep halving the range. Binary search works the same way—but it requires the data to be sorted. Each step cuts the search space in half, so you find the item in about log₂(n) steps instead of n.
def linear_search(arr, target):
"""O(n) search - check every element."""
comparisons = 0
for i, val in enumerate(arr):
comparisons += 1
if val == target:
return i, comparisons
return -1, comparisons
def binary_search(arr, target):
"""O(log n) search - requires sorted array."""
left, right = 0, len(arr) - 1
comparisons = 0
while left <= right:
comparisons += 1
mid = (left + right) // 2
if arr[mid] == target:
return mid, comparisons
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1, comparisons
# Compare on sorted data
import random
random.seed(42)
data = sorted(random.sample(range(1_000_000), 100_000))
target = data[99_000] # near the end
idx_lin, comp_lin = linear_search(data, target)
idx_bin, comp_bin = binary_search(data, target)
print(f"Searching for {target} in {len(data):,} sorted elements:")
print(f" Linear search: found at index {idx_lin:,}, {comp_lin:,} comparisons")
print(f" Binary search: found at index {idx_bin:,}, {comp_bin} comparisons")
print(f" Binary search was {comp_lin / comp_bin:.0f}x more efficient!")
What just happened¶
We searched for an element near the end of a sorted list of 100,000 numbers. Linear search had to check almost every element (about 99,000 comparisons). Binary search repeatedly halved the range, so it needed only around 17 comparisons. That's roughly 5,800× fewer checks—and it shows. For sorted data, binary search is almost always the right choice. In ML, this pattern appears in hyperparameter search (when you binary-search over learning rates) and in nearest-neighbor structures like KD-trees.
Try it yourself: Change target to data[0] (first element) or data[50_000] (middle). How do the comparison counts change for linear vs binary search?
Two-pointer and sliding window¶
Two-pointer (plain English): You have a sorted list and need two numbers that sum to a target. Put one finger at the start and one at the end. If the sum is too small, move the left finger right (bigger numbers). If too big, move the right finger left (smaller numbers). You converge in one pass—O(n) instead of checking every pair O(n²). Used for feature selection, nearest-neighbor variants, and constraint problems.
Sliding window (plain English): Imagine a fixed-width frame sliding along a signal. At each position, you compute something (max, sum, etc.) inside the window. This is exactly what max-pooling does in CNNs: reduce spatial dimensions by taking the maximum in each local region. The code below shows both techniques.
# Two-pointer technique: finding pairs that sum to a target
# Used in: feature selection, nearest neighbor search, constraint satisfaction
def two_sum_sorted(arr, target):
"""Find two numbers in a sorted array that sum to target. O(n) time."""
left, right = 0, len(arr) - 1
while left < right:
current_sum = arr[left] + arr[right]
if current_sum == target:
return (left, right)
elif current_sum < target:
left += 1
else:
right -= 1
return None
def sliding_window_max(arr, window_size):
"""Find max value in each sliding window. Used in pooling layers."""
if not arr or window_size <= 0:
return []
results = []
for i in range(len(arr) - window_size + 1):
window = arr[i:i + window_size]
results.append(max(window))
return results
# Two-pointer demo
sorted_data = [1, 3, 5, 7, 11, 15, 20, 25]
target_sum = 22
result = two_sum_sorted(sorted_data, target_sum)
if result:
i, j = result
print(f"Two sum: {sorted_data[i]} + {sorted_data[j]} = {target_sum}")
# Sliding window demo (similar to CNN pooling)
signal = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
pooled = sliding_window_max(signal, 3)
print(f"\nSignal: {signal}")
print(f"Max pool (k=3): {pooled}")
What just happened¶
Two sum: We found that 7 + 15 = 22 in the sorted array by moving left and right inward—no nested loops.
Sliding window: For each 3-element window, we took the max. The result [4, 4, 5, 9, 9, 6, 5] is the max-pooled version of the signal. In a CNN, this downsamples the feature map while preserving the strongest activations.
Common mistake: Using two-pointer on unsorted data. The algorithm assumes sorted order so that moving left/right predictably changes the sum. If your data isn't sorted, sort it first—O(n log n)—or use a different approach like a hash set.
What's Next?¶
In Notebook 02 (Intermediate), we'll cover: - Stacks and queues (used in BFS/DFS, backtracking) - Hash tables deep-dive - Sorting algorithms (merge sort, quicksort) - When to use which data structure
Generated by Berta AI | Created by Luigi Pascal Rondanini