Skip to content

Ch 2: Data Structures & Algorithms - Introduction

Track: Foundation | Try code in Playground | Back to chapter overview

Read online or run locally

You can read this content here on the web. To run the code interactively, either use the Playground or clone the repo and open chapters/chapter-02-data-structures/notebooks/01_introduction.ipynb in Jupyter.


Chapter 2: Data Structures & Algorithms

Notebook 01 - Introduction

Understanding data structures and algorithms is fundamental to writing efficient AI code. This notebook covers arrays, complexity analysis, and searching.

What you'll learn: - Why data structures matter for AI - Arrays and dynamic arrays (Python lists) - Big O notation and complexity analysis - Linear search and binary search - Two-pointer and sliding window techniques

Time estimate: 2 hours


Generated by Berta AI | Created by Luigi Pascal Rondanini

1. Why Data Structures Matter for AI

Why data structures matter: In real ML pipelines, you might process millions of training examples, look up vocabulary in embedding tables thousands of times per second, or manage queues of inference requests. Choosing the wrong structure—like using a list when you need fast lookups—can turn a 10-minute job into a 10-hour job. The right choice is invisible when things work; the wrong choice shows up as "why is my training so slow?" at 2am. Think of data structures as the foundation: get it right early, and everything built on top runs smoothly.

In AI/ML work, the choice of data structure directly impacts: - How fast your data pipeline runs - How much memory your models need - Whether your training loop finishes in hours or days

Operation List Dict Set
Lookup by index O(1) - -
Lookup by value O(n) O(1) O(1)
Insert at end O(1)* O(1)* O(1)*
Insert at start O(n) - -
Delete by value O(n) O(1) O(1)

* amortized

Let's prove it with a benchmark. The code below times how long it takes to look up an item in a list versus a dict versus a set. We're measuring lookup by value—"is this thing in my collection?"

import time

def benchmark(func, *args, runs=1000):
    """Time a function over multiple runs."""
    start = time.perf_counter()
    for _ in range(runs):
        func(*args)
    elapsed = (time.perf_counter() - start) / runs
    return elapsed

# List vs Dict vs Set: lookup performance
n = 10_000
data_list = list(range(n))
data_dict = {i: True for i in range(n)}
data_set = set(range(n))

target = n - 1  # worst case for list

list_time = benchmark(lambda: target in data_list, runs=100)
dict_time = benchmark(lambda: target in data_dict, runs=100)
set_time = benchmark(lambda: target in data_set, runs=100)

print(f"Looking up element {target} in {n:,} items:")
print(f"  List: {list_time*1e6:>10.1f} us")
print(f"  Dict: {dict_time*1e6:>10.1f} us")
print(f"  Set:  {set_time*1e6:>10.1f} us")
print(f"\nDict is ~{list_time/max(dict_time,1e-10):.0f}x faster than list for lookup!")

What just happened

The benchmark ran the same lookup operation 100 times and averaged the result. The numbers show microseconds per lookup (1 us = 0.000001 seconds). A list must scan from the start until it finds the item—or reaches the end—so with 10,000 items and a target at the end, it checks all 10,000. A dict and set use hashing: they jump directly to the right "bucket," so lookup time is nearly constant no matter how many items. That's why dict is tens or hundreds of times faster for this operation. In ML, vocab lookups, feature hashing, and deduplication all rely on O(1) structures—this speedup scales with your data size.

2. Big O Notation

Big O is like speed ratings for algorithms. Imagine a restaurant: O(1) is like already knowing your order—you don't need to read the menu. O(n) is reading the whole menu, one item at a time—more items means more time. O(n²) is comparing every dish to every other dish—that gets slow fast. Big O tells you how the worst case scales: if you double the data, does your algorithm take twice as long (O(n)), four times as long (O(n²)), or the same (O(1))? For AI, this matters when your dataset grows from 1K to 1M samples.

Big O describes how an algorithm's time or space grows as input size increases. It's the language engineers use to discuss efficiency.

Big O Name Example 1K items 1M items
O(1) Constant Dict lookup 1 op 1 op
O(log n) Logarithmic Binary search 10 ops 20 ops
O(n) Linear Linear search 1K ops 1M ops
O(n log n) Linearithmic Merge sort 10K ops 20M ops
O(n²) Quadratic Bubble sort 1M ops 1T ops
O(2ⁿ) Exponential Brute-force Heat death Heat death
import math

def demonstrate_complexity():
    """Show how different complexities scale."""
    sizes = [10, 100, 1_000, 10_000, 100_000]

    print(f"{'n':>10} {'O(1)':>10} {'O(log n)':>10} {'O(n)':>10} {'O(n log n)':>12} {'O(n²)':>14}")
    print("-" * 70)

    for n in sizes:
        o1 = 1
        olog = math.ceil(math.log2(n))
        on = n
        onlogn = n * math.ceil(math.log2(n))
        on2 = n * n
        print(f"{n:>10,} {o1:>10,} {olog:>10,} {on:>10,} {onlogn:>12,} {on2:>14,}")

demonstrate_complexity()

What just happened

The table shows how many "operations" each complexity class requires as n grows. Notice O(n²) explodes: 100,000 items means 10 billion operations. O(n log n) stays manageable—merge sort on a million items is around 20 million ops. This is why we care: a poorly chosen algorithm on big data can mean the difference between seconds and days.

3. Searching Algorithms

Searching is one of the most fundamental operations. In AI you search through datasets, model parameters, hyperparameter spaces, and more.

Linear search (plain English): Imagine looking for your keys in your apartment. You check the kitchen, then the bedroom, then the couch—one place at a time until you find them. Linear search does exactly that: start at the beginning, check each element in order. It always works but can be slow if the item is at the end.

Binary search (plain English): Now imagine your keys could only be in drawers numbered 1 to 100, and the drawers are sorted. You open drawer 50: too high. So it must be in 1–49. Open 25: too low. So 26–49. You keep halving the range. Binary search works the same way—but it requires the data to be sorted. Each step cuts the search space in half, so you find the item in about log₂(n) steps instead of n.

def linear_search(arr, target):
    """O(n) search - check every element."""
    comparisons = 0
    for i, val in enumerate(arr):
        comparisons += 1
        if val == target:
            return i, comparisons
    return -1, comparisons


def binary_search(arr, target):
    """O(log n) search - requires sorted array."""
    left, right = 0, len(arr) - 1
    comparisons = 0

    while left <= right:
        comparisons += 1
        mid = (left + right) // 2
        if arr[mid] == target:
            return mid, comparisons
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1

    return -1, comparisons


# Compare on sorted data
import random
random.seed(42)
data = sorted(random.sample(range(1_000_000), 100_000))
target = data[99_000]  # near the end

idx_lin, comp_lin = linear_search(data, target)
idx_bin, comp_bin = binary_search(data, target)

print(f"Searching for {target} in {len(data):,} sorted elements:")
print(f"  Linear search: found at index {idx_lin:,}, {comp_lin:,} comparisons")
print(f"  Binary search: found at index {idx_bin:,}, {comp_bin} comparisons")
print(f"  Binary search was {comp_lin / comp_bin:.0f}x more efficient!")

What just happened

We searched for an element near the end of a sorted list of 100,000 numbers. Linear search had to check almost every element (about 99,000 comparisons). Binary search repeatedly halved the range, so it needed only around 17 comparisons. That's roughly 5,800× fewer checks—and it shows. For sorted data, binary search is almost always the right choice. In ML, this pattern appears in hyperparameter search (when you binary-search over learning rates) and in nearest-neighbor structures like KD-trees.

Try it yourself: Change target to data[0] (first element) or data[50_000] (middle). How do the comparison counts change for linear vs binary search?

Two-pointer and sliding window

Two-pointer (plain English): You have a sorted list and need two numbers that sum to a target. Put one finger at the start and one at the end. If the sum is too small, move the left finger right (bigger numbers). If too big, move the right finger left (smaller numbers). You converge in one pass—O(n) instead of checking every pair O(n²). Used for feature selection, nearest-neighbor variants, and constraint problems.

Sliding window (plain English): Imagine a fixed-width frame sliding along a signal. At each position, you compute something (max, sum, etc.) inside the window. This is exactly what max-pooling does in CNNs: reduce spatial dimensions by taking the maximum in each local region. The code below shows both techniques.

# Two-pointer technique: finding pairs that sum to a target
# Used in: feature selection, nearest neighbor search, constraint satisfaction

def two_sum_sorted(arr, target):
    """Find two numbers in a sorted array that sum to target. O(n) time."""
    left, right = 0, len(arr) - 1

    while left < right:
        current_sum = arr[left] + arr[right]
        if current_sum == target:
            return (left, right)
        elif current_sum < target:
            left += 1
        else:
            right -= 1

    return None


def sliding_window_max(arr, window_size):
    """Find max value in each sliding window. Used in pooling layers."""
    if not arr or window_size <= 0:
        return []
    results = []
    for i in range(len(arr) - window_size + 1):
        window = arr[i:i + window_size]
        results.append(max(window))
    return results


# Two-pointer demo
sorted_data = [1, 3, 5, 7, 11, 15, 20, 25]
target_sum = 22
result = two_sum_sorted(sorted_data, target_sum)
if result:
    i, j = result
    print(f"Two sum: {sorted_data[i]} + {sorted_data[j]} = {target_sum}")

# Sliding window demo (similar to CNN pooling)
signal = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
pooled = sliding_window_max(signal, 3)
print(f"\nSignal:         {signal}")
print(f"Max pool (k=3): {pooled}")

What just happened

Two sum: We found that 7 + 15 = 22 in the sorted array by moving left and right inward—no nested loops.

Sliding window: For each 3-element window, we took the max. The result [4, 4, 5, 9, 9, 6, 5] is the max-pooled version of the signal. In a CNN, this downsamples the feature map while preserving the strongest activations.

Common mistake: Using two-pointer on unsorted data. The algorithm assumes sorted order so that moving left/right predictably changes the sum. If your data isn't sorted, sort it first—O(n log n)—or use a different approach like a hash set.

What's Next?

In Notebook 02 (Intermediate), we'll cover: - Stacks and queues (used in BFS/DFS, backtracking) - Hash tables deep-dive - Sorting algorithms (merge sort, quicksort) - When to use which data structure


Generated by Berta AI | Created by Luigi Pascal Rondanini


Back to Ch 2 overview | Try in Playground | View on GitHub