Ch 1: Python Fundamentals for AI - Advanced¶

Track: Foundation | Try code in Playground | Back to chapter overview

Read online or run locally

You can read this content here on the web. To run the code interactively, either use the Playground or clone the repo and open chapters/chapter-01-python-fundamentals/notebooks/03_advanced.ipynb in Jupyter.

Chapter 1: Python Fundamentals for AI¶

Notebook 03 — Advanced¶

The final notebook in Chapter 1. Here we cover the patterns and techniques that separate beginner Python from professional AI code.

What you’ll learn: - Object-Oriented Programming for AI - File I/O (CSV, JSON, text) - Generators and iterators - Decorators - Capstone project: build a mini ML experiment tracker

Time estimate: 2 hours

Generated by Berta AI | Created by Luigi Pascal Rondanini

1. Object-Oriented Programming for AI¶

A class is a blueprint for creating objects, just like an architectural blueprint is a plan for building houses. The blueprint defines what every house will have (rooms, doors, windows), but each actual house built from that blueprint can have different specifics (paint color, number of bedrooms, furniture inside). In Python, the blueprint is the class, and each house built from it is called an instance (or object).

Why does this matter for AI? Because every major AI framework is built on classes. In PyTorch, your model is a class. Your dataset is a class. Your training loop often lives inside a class. When you write model = BertForSequenceClassification(...), you’re creating an instance from a blueprint that defines how BERT works. Understanding OOP is not optional for AI work — it’s a prerequisite.

Let’s start with the key concepts before writing any code:

__init__: The constructor method. It runs automatically when you create a new instance. This is where you set up the object’s initial state. Think of it as the “move-in day” for your house — you set up furniture, connect utilities, etc.
self: A reference to the current instance. Every method in a class receives self as its first argument. It’s how the object refers to its own data. When you write self.weights, you’re saying “the weights that belong to this particular layer.”
Methods: Functions that belong to a class. They define what the object can do.
Properties: Special methods that behave like attributes. They let you compute a value on-the-fly instead of storing it.

Let’s build a simplified neural network layer to see all of this in action.

import random
import math
from datetime import datetime


class NeuralLayer:
    """A simplified neural network layer to demonstrate OOP concepts."""

    def __init__(self, input_size, output_size, activation="relu"):
        self.input_size = input_size
        self.output_size = output_size
        self.activation = activation

        # Xavier initialization (simplified)
        scale = math.sqrt(2.0 / (input_size + output_size))
        self.weights = [
            [random.gauss(0, scale) for _ in range(output_size)]
            for _ in range(input_size)
        ]
        self.bias = [0.0] * output_size

    def forward(self, inputs):
        """Compute the forward pass."""
        output = list(self.bias)
        for i, x in enumerate(inputs):
            for j in range(self.output_size):
                output[j] += x * self.weights[i][j]

        if self.activation == "relu":
            output = [max(0, v) for v in output]
        elif self.activation == "sigmoid":
            output = [1 / (1 + math.exp(-min(max(v, -500), 500))) for v in output]

        return output

    @property
    def num_parameters(self):
        return self.input_size * self.output_size + self.output_size

    def __repr__(self):
        return (f"NeuralLayer({self.input_size} -> {self.output_size}, "
                f"activation={self.activation}, params={self.num_parameters})")


# Build a small network
random.seed(42)
layer1 = NeuralLayer(4, 8, "relu")
layer2 = NeuralLayer(8, 3, "sigmoid")

print(f"Layer 1: {layer1}")
print(f"Layer 2: {layer2}")
print(f"Total params: {layer1.num_parameters + layer2.num_parameters}")

# Forward pass
sample_input = [0.5, -0.3, 0.8, 0.1]
hidden = layer1.forward(sample_input)
output = layer2.forward(hidden)

print(f"\nInput:  {sample_input}")
print(f"Hidden: {[f'{h:.4f}' for h in hidden]}")
print(f"Output: {[f'{o:.4f}' for o in output]}")
print(f"Predicted class: {output.index(max(output))}")

What just happened?¶

Let’s walk through each part of the NeuralLayer class and understand what it represents in a real neural network:

__init__ sets up the layer with an input_size (how many values come in), an output_size (how many values go out), and an activation function. It also creates the weights — a 2D grid of numbers that the layer uses to transform its input. The weights are initialized using Xavier initialization, a technique that helps training start smoothly by keeping values in a reasonable range.
forward() is the core computation. In a real neural network, the forward pass is where data flows through the network: each input value is multiplied by each weight, the results are summed up (with a bias added), and then an activation function is applied. The activation function introduces non-linearity — without it, stacking multiple layers would be mathematically equivalent to having just one layer.
@property makes num_parameters behave like an attribute even though it’s actually computed. You access it as layer1.num_parameters (no parentheses), but behind the scenes Python is running the calculation. A layer with input_size=4 and output_size=8 has 4 * 8 + 8 = 40 parameters (32 weights + 8 biases).
__repr__ controls what gets printed when you display the object. Without it, you’d just see something like <__main__.NeuralLayer object at 0x7f...>.

The forward pass at the end shows data flowing through two layers: input → hidden → output. The output layer uses sigmoid activation, which squashes values between 0 and 1 (useful for classification). The predicted class is the index of the highest output value.

Building a Network from Layers¶

Now let’s see how classes can compose together. A SimpleNetwork contains multiple NeuralLayer instances — this is the same pattern PyTorch uses with nn.Sequential. Each layer’s output becomes the next layer’s input, forming a chain.

class SimpleNetwork:
    """A simple sequential neural network."""

    def __init__(self, *layers):
        self.layers = list(layers)

    def forward(self, inputs):
        x = inputs
        for layer in self.layers:
            x = layer.forward(x)
        return x

    @property
    def total_parameters(self):
        return sum(layer.num_parameters for layer in self.layers)

    def summary(self):
        print(f"{'Layer':>10} {'Shape':>15} {'Activation':>12} {'Params':>8}")
        print("-" * 50)
        for i, layer in enumerate(self.layers):
            shape = f"{layer.input_size} -> {layer.output_size}"
            print(f"{'Layer ' + str(i+1):>10} {shape:>15} {layer.activation:>12} {layer.num_parameters:>8}")
        print("-" * 50)
        print(f"{'Total':>10} {'':>15} {'':>12} {self.total_parameters:>8}")


random.seed(42)
net = SimpleNetwork(
    NeuralLayer(10, 64, "relu"),
    NeuralLayer(64, 32, "relu"),
    NeuralLayer(32, 5, "sigmoid"),
)

net.summary()

sample = [random.uniform(-1, 1) for _ in range(10)]
prediction = net.forward(sample)
print(f"\nPrediction: {[f'{p:.4f}' for p in prediction]}")

What just happened?¶

The SimpleNetwork class takes any number of layers (using *layers) and chains them together. The forward method feeds data through each layer in sequence — this is called a feed-forward network. The summary method prints a table showing the architecture, very similar to what PyTorch’s model.summary() or Keras’s model.summary() would show.

Notice the pattern: 10 -> 64 -> 32 -> 5. The data starts with 10 input features, gets expanded to 64 dimensions (the network learns a richer representation), then compressed to 32, and finally to 5 outputs (one for each class). This “funnel” shape is a common architecture pattern.

Common Mistake — Mismatched dimensions:
If Layer 1 outputs 64 values but Layer 2 expects 128 inputs, you’ll get an error. The output_size of one layer must match the input_size of the next. This is one of the most common bugs when building neural networks.

✍️ Try it yourself¶

Add a fourth layer to the network: NeuralLayer(5, 2, "sigmoid"). Update the SimpleNetwork creation and run net.summary() again. How many total parameters does the network have now?

2. File I/O¶

Reading and writing data files is a daily task in AI. You’ll work with three main formats:

CSV (Comma-Separated Values): The go-to format for tabular data (rows and columns). Every spreadsheet can export CSV. It’s human-readable and universal, but doesn’t handle nested data well.
JSON (JavaScript Object Notation): Perfect for structured data with nesting — model configurations, API responses, experiment logs. It maps directly to Python dictionaries.
Plain text: Used for NLP datasets, log files, and any unstructured text data.

One critical Python pattern for file I/O is the with statement (also called a context manager). Always open files using with:

with open("data.csv", "r") as f:
    content = f.read()

The with statement guarantees that the file is properly closed when you’re done, even if an error occurs inside the block. Without with, you’d need to manually call f.close(), and forgetting to do so can lead to data corruption or resource leaks. Think of with as a responsible adult who always locks the door behind you.

import csv
import json
import os

# Generate and write a CSV dataset
random.seed(42)
data_dir = os.path.join(os.path.dirname(os.getcwd()), "datasets") if os.path.exists("../datasets") else "/tmp"
os.makedirs(data_dir, exist_ok=True)
csv_path = os.path.join(data_dir, "sample_data.csv")

headers = ["id", "feature_1", "feature_2", "feature_3", "label"]
rows = []
for i in range(50):
    f1 = round(random.gauss(0, 1), 4)
    f2 = round(random.gauss(0.5, 0.8), 4)
    f3 = round(random.uniform(0, 10), 4)
    label = "positive" if (f1 + f2 * 0.5 + f3 * 0.1) > 0.5 else "negative"
    rows.append([i, f1, f2, f3, label])

with open(csv_path, 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(headers)
    writer.writerows(rows)

print(f"Wrote {len(rows)} rows to {csv_path}")

# Read it back
with open(csv_path, 'r') as f:
    reader = csv.DictReader(f)
    loaded_data = list(reader)

print(f"\nLoaded {len(loaded_data)} rows")
print(f"Columns: {list(loaded_data[0].keys())}")
print(f"First row: {loaded_data[0]}")

What just happened?¶

We created a synthetic dataset with three numerical features and a binary label, then saved it as a CSV file. The writing process uses csv.writer which handles all the formatting details (commas between values, quoting strings that contain commas, etc.).

Reading it back, we used csv.DictReader, which is usually better than csv.reader because it returns each row as a dictionary with column names as keys. This means you write row["label"] instead of row[4], which is much more readable and less error-prone.

Important: notice that all values come back as strings! loaded_data[0]["feature_1"] is the string "0.4967", not the float 0.4967. You’ll need to convert them before doing math — this connects back to the type conversion lesson in Notebook 01.

JSON: Configurations and Experiment Logs¶

JSON is the standard format for structured data that doesn’t fit neatly into rows and columns. Model configurations, API responses, and experiment metadata are all typically stored as JSON. Python’s json module makes it trivially easy to convert between Python dictionaries and JSON files.

# JSON: configs, API responses, experiment logs
experiment = {
    "name": "transformer-v1",
    "timestamp": datetime.now().isoformat(),
    "config": {
        "model": "bert-base",
        "learning_rate": 3e-5,
        "epochs": 5,
        "batch_size": 16,
    },
    "results": {
        "accuracy": 0.923,
        "f1_score": 0.917,
        "training_time_seconds": 3420,
    },
    "tags": ["nlp", "classification", "bert"],
}

json_path = os.path.join(data_dir, "experiment.json")
with open(json_path, 'w') as f:
    json.dump(experiment, f, indent=2)

print(f"Experiment saved to {json_path}")
print(json.dumps(experiment, indent=2))

What just happened?¶

We created a Python dictionary representing an ML experiment and saved it as JSON. Notice the nested structure: config and results are dictionaries inside the outer dictionary, and tags is a list. JSON handles all of this naturally.

Two key functions: - json.dump(data, file): Writes the data to a file. - json.dumps(data): Converts the data to a JSON string (the s stands for “string”).

The indent=2 argument makes the output human-readable with nice formatting. Without it, everything would be on one long line.

In real ML work, you’d save an experiment log like this after every training run. Over time, you build up a collection of JSON files that lets you compare experiments and reproduce results.

✍️ Try it yourself¶

Read the JSON file back using json.load() and access the learning rate: loaded_experiment["config"]["learning_rate"]. Does it come back as the correct type (float), or as a string like CSV does?

3. Generators and Iterators¶

Imagine you need to process a dataset of 10 million images. Loading all of them into memory at once would crash your program (or at least make your computer very unhappy). Generators solve this problem by producing values one at a time, on demand — a technique called lazy evaluation.

Think of it like a vending machine: it doesn’t keep all possible drinks on a table in front of you. Instead, when you press a button, it produces one drink. Generators work the same way — they produce one value each time you ask, using the yield keyword instead of return.

The key differences between yield and return: - return sends a value back and ends the function. The function’s state is lost. - yield sends a value back and pauses the function. The function’s state is preserved, and it resumes right where it left off the next time you ask for a value.

This pattern is used everywhere in AI. PyTorch’s DataLoader is essentially a generator that yields batches of data. It never loads the entire dataset into memory — it reads one batch at a time from disk.

def data_batch_generator(data, batch_size=8, shuffle=True):
    """Generate batches from a dataset (like PyTorch DataLoader)."""
    indices = list(range(len(data)))
    if shuffle:
        random.shuffle(indices)

    for start in range(0, len(indices), batch_size):
        batch_indices = indices[start:start + batch_size]
        batch = [data[i] for i in batch_indices]
        yield batch


# Simulate a dataset of 25 items
random.seed(42)
dataset = [f"sample_{i}" for i in range(25)]

print("Iterating through batches:")
for batch_idx, batch in enumerate(data_batch_generator(dataset, batch_size=8)):
    print(f"  Batch {batch_idx}: {len(batch)} items -> {batch[:3]}...")

print(f"\nTotal items: {len(dataset)}")
print(f"Batch size: 8")
print(f"Number of batches: {math.ceil(len(dataset) / 8)}")

What just happened?¶

The data_batch_generator function uses yield instead of return. Each time the for loop asks for the next batch, the generator: 1. Resumes where it left off. 2. Computes the next batch of indices. 3. Gathers the corresponding data items. 4. Yields (produces) the batch and pauses again.

The crucial benefit is memory efficiency: if your dataset has 10 million items, only one batch (say, 32 items) is in memory at any given time. The generator is also lazy — it doesn’t compute any batches until you actually ask for them.

Notice that 25 items don’t divide evenly into batches of 8: the last batch has only 1 item. This is normal and expected — real DataLoaders have options to either drop the last incomplete batch or pad it.

Common Mistake — Generators are single-use:
Once you’ve iterated through a generator, it’s exhausted — you can’t loop over it again. If you need multiple passes (like training for multiple epochs), create a new generator each time.

✍️ Try it yourself¶

Try saving the generator to a variable: gen = data_batch_generator(dataset). Then call list(gen) twice. The first time you’ll get all the batches; the second time you’ll get an empty list. This demonstrates the single-use nature of generators.

4. Decorators¶

A decorator adds behavior to a function without changing the function itself. Think of it like adding a phone case to your phone — the phone still works the same way, but now it has extra protection (or style). In Python, decorators are functions that wrap other functions.

The syntax uses the @ symbol:

@timer
def my_function():
    ...

This is exactly equivalent to:

def my_function():
    ...
my_function = timer(my_function)

Decorators are everywhere in real AI code: - @timer: Measure how long a function takes (useful for profiling training steps). - @retry: Automatically retry a function if it fails (essential for flaky API calls). - @torch.no_grad(): Tell PyTorch not to track gradients during evaluation. - @functools.lru_cache: Cache expensive computations so they only run once.

Let’s build up the concept step by step. First, a simple decorator that measures execution time:

import time
import functools


def timer(func):
    """Measure execution time of a function."""
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        elapsed = time.perf_counter() - start
        print(f"  [{func.__name__}] completed in {elapsed:.4f}s")
        return result
    return wrapper


def retry(max_attempts=3, delay=0.1):
    """Retry a function if it raises an exception."""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(1, max_attempts + 1):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    if attempt == max_attempts:
                        raise
                    print(f"  Attempt {attempt} failed: {e}. Retrying...")
                    time.sleep(delay)
        return wrapper
    return decorator


@timer
def process_data(n):
    """Simulate a data processing task."""
    return sum(i ** 2 for i in range(n))


@retry(max_attempts=3)
def unreliable_api_call():
    """Simulate a flaky API."""
    if random.random() < 0.6:
        raise ConnectionError("API timeout")
    return {"status": "success", "data": [1, 2, 3]}


print("Timer decorator:")
result = process_data(1_000_000)
print(f"  Result: {result:,}")

print("\nRetry decorator:")
random.seed(42)
try:
    response = unreliable_api_call()
    print(f"  Response: {response}")
except ConnectionError:
    print("  All attempts failed.")

What just happened?¶

We built two decorators:

The timer decorator wraps any function to measure its execution time. Here’s how it works step by step: 1. timer receives the original function func. 2. It creates a wrapper function that records the start time, calls func, records the end time, prints the elapsed time, and returns the result. 3. It returns the wrapper, which now replaces the original function.

So when you call process_data(1_000_000), you’re actually calling wrapper(1_000_000), which calls the original process_data inside and adds timing around it.

The retry decorator is slightly more complex because it takes arguments (max_attempts, delay). This requires an extra layer of nesting: retry() returns decorator, which returns wrapper. The inner function tries calling the original function up to max_attempts times, catching any exceptions along the way.

@functools.wraps(func) is a small but important detail: it preserves the original function’s name and docstring on the wrapper. Without it, process_data.__name__ would return "wrapper" instead of "process_data", which makes debugging confusing.

✍️ Try it yourself¶

Apply the @timer decorator to the unreliable_api_call function (you can stack decorators by putting @timer above @retry). Now it will print both retry messages AND the total time for all attempts.

5. Capstone: Mini Experiment Tracker¶

In real machine learning work, you run hundreds of experiments. You try different models, tweak hyperparameters, test on different data splits, and compare results. Without a systematic way to track all of this, you quickly lose track of what you’ve tried and what worked.

Professional tools like Weights & Biases, MLflow, and Neptune solve this at scale. But at their core, they all do the same thing: log experiments with their configurations, metrics, and metadata, then let you compare and find the best one.

Let’s build a mini version of an experiment tracker that combines everything from all three notebooks: classes (OOP), dictionaries (data storage), functions (computation), f-strings (display), sorting (comparison), and error handling (robustness).

class ExperimentTracker:
    """Track ML experiments with logging, metrics, and comparison."""

    def __init__(self, project_name):
        self.project_name = project_name
        self.experiments = []
        self.created_at = datetime.now()

    def log_experiment(self, name, config, metrics, tags=None):
        """Log a new experiment."""
        experiment = {
            "id": len(self.experiments) + 1,
            "name": name,
            "config": config,
            "metrics": metrics,
            "tags": tags or [],
            "timestamp": datetime.now().isoformat(),
        }
        self.experiments.append(experiment)
        return experiment["id"]

    def get_best(self, metric="accuracy", higher_is_better=True):
        """Find the best experiment by a given metric."""
        if not self.experiments:
            return None
        key = lambda e: e["metrics"].get(metric, 0)
        return max(self.experiments, key=key) if higher_is_better else min(self.experiments, key=key)

    def compare(self, metric="accuracy"):
        """Print a comparison table of all experiments."""
        sorted_exps = sorted(
            self.experiments,
            key=lambda e: e["metrics"].get(metric, 0),
            reverse=True
        )

        print(f"\n{'#':>3} {'Name':>20} {'Accuracy':>10} {'Loss':>10} {'F1':>10} {'Time(s)':>10}")
        print("-" * 67)
        for exp in sorted_exps:
            m = exp["metrics"]
            print(f"{exp['id']:>3} {exp['name']:>20} "
                  f"{m.get('accuracy', 0):>10.4f} "
                  f"{m.get('loss', 0):>10.4f} "
                  f"{m.get('f1_score', 0):>10.4f} "
                  f"{m.get('training_time', 0):>10.1f}")

    def summary(self):
        """Print project summary."""
        print(f"\nProject: {self.project_name}")
        print(f"Experiments: {len(self.experiments)}")
        if self.experiments:
            best = self.get_best()
            print(f"Best (accuracy): {best['name']} ({best['metrics']['accuracy']:.2%})")
        print()

Understanding the ExperimentTracker¶

Let’s break down each method:

__init__ sets up the tracker with a project name, an empty list of experiments, and a timestamp. Every new tracker starts as a blank slate.
log_experiment records one experiment. It stores the model’s configuration (hyperparameters), its metrics (accuracy, loss, etc.), optional tags for categorization, and an auto-incrementing ID. The tags or [] pattern means “use the provided tags, or default to an empty list if none were given.”
get_best finds the experiment with the highest (or lowest) value of a given metric. The higher_is_better parameter lets you handle metrics like accuracy (higher is better) and loss (lower is better) with the same function.
compare sorts all experiments by a metric and prints a formatted comparison table.
summary gives a quick overview of the project.

Now let’s use it with some realistic experiments:

# Use the tracker
tracker = ExperimentTracker("text-classification")

experiments = [
    ("logistic-regression", {"model": "sklearn-lr", "C": 1.0}, 
     {"accuracy": 0.82, "loss": 0.41, "f1_score": 0.79, "training_time": 2.1}),
    ("random-forest", {"model": "sklearn-rf", "n_estimators": 100},
     {"accuracy": 0.87, "loss": 0.32, "f1_score": 0.85, "training_time": 15.3}),
    ("bert-base-ft", {"model": "bert-base", "lr": 3e-5, "epochs": 3},
     {"accuracy": 0.93, "loss": 0.19, "f1_score": 0.91, "training_time": 1240.0}),
    ("distilbert-ft", {"model": "distilbert", "lr": 5e-5, "epochs": 5},
     {"accuracy": 0.91, "loss": 0.22, "f1_score": 0.89, "training_time": 680.0}),
    ("gpt4-zeroshot", {"model": "gpt-4", "prompt": "classify"},
     {"accuracy": 0.88, "loss": 0.28, "f1_score": 0.86, "training_time": 0.0}),
]

for name, config, metrics in experiments:
    tracker.log_experiment(name, config, metrics)

tracker.summary()
tracker.compare()

print("\n--- Best by different metrics ---")
best_acc = tracker.get_best("accuracy")
best_speed = tracker.get_best("training_time", higher_is_better=False)
print(f"Best accuracy:   {best_acc['name']} ({best_acc['metrics']['accuracy']:.2%})")
print(f"Fastest:         {best_speed['name']} ({best_speed['metrics']['training_time']:.1f}s)")

What just happened?¶

We logged five different approaches to text classification, spanning the full spectrum of modern ML:

Logistic Regression: A simple, fast baseline. Low accuracy (82%) but trains in 2 seconds.
Random Forest: A classical ML model. Better accuracy (87%) but still no deep learning.
BERT fine-tuned: A pre-trained transformer fine-tuned on our data. Best accuracy (93%) but takes 20 minutes to train.
DistilBERT fine-tuned: A smaller, faster version of BERT. Nearly as good (91%) in half the time.
GPT-4 zero-shot: Using a large language model without any training. Good accuracy (88%) and no training time at all, but relies on expensive API calls.

The comparison table makes the tradeoffs clear: BERT is the most accurate, but GPT-4 zero-shot is the fastest to deploy. In real projects, the best choice depends on your constraints (budget, latency requirements, data availability).

Notice how the get_best method works for both “higher is better” (accuracy) and “lower is better” (training time) metrics. This kind of flexibility is what makes well-designed classes so valuable.

✍️ Try it yourself¶

Add a sixth experiment to the tracker — perhaps a fine-tuned LLaMA model or an ensemble approach. Then call tracker.compare("f1_score") to rank by F1 score instead of accuracy. Which model wins?

Chapter 1 Complete!¶

Congratulations! You’ve covered all the Python fundamentals needed for AI work:

Notebook 01: Variables, types, strings, control flow, loops, comprehensions
Notebook 02: Collections, functions, error handling, modules, data pipelines
Notebook 03: OOP, file I/O, generators, decorators, experiment tracking

Ready for What’s Next?¶

Now try the exercises in exercises/exercises.py to solidify your skills.

Then move on to Chapter 2: Data Structures & Algorithms where you’ll learn about algorithmic thinking and complexity analysis.

Generated by Berta AI | Created by Luigi Pascal Rondanini

Back to Ch 1 overview | Try in Playground | View on GitHub