Skip to content

Chapter 10: Natural Language Processing Basics

Learn how to process and analyze text with AI—tokenization, TF-IDF, word embeddings, sentiment analysis, text classification, and named entity recognition.


Metadata

Field Value
Track Practitioner
Time 8–10 hours
Prerequisites Chapters 1–9 (especially Chapter 9: Deep Learning Fundamentals)

Learning Objectives

  • Understand text representation: tokenization, vectorization, and word embeddings
  • Master classic NLP techniques: TF-IDF, word2vec, GloVe
  • Build text classification and sentiment analysis pipelines
  • Implement named entity recognition with spaCy
  • Use sequence models (RNNs, LSTMs) for NLP
  • Know when to use which technique and deploy simple NLP models

What's Included

Notebooks

Notebook Description
01_nlp_fundamentals.ipynb Tokenization, preprocessing, BoW/TF-IDF, word embeddings, first sentiment model
02_nlp_classification.ipynb Deep learning for text, multi-class classification, NER, similarity and clustering
03_nlp_advanced.ipynb Attention, seq2seq, transfer learning, production considerations, capstone

Scripts

  • text_preprocessing.py — Tokenization, stopwords, lemmatization, vocabulary, TextPreprocessor class
  • embedding_utils.py — Load embeddings, similarity, analogies, EmbeddingIndex
  • nlp_models.pySentimentAnalyzer, TextClassifier, NERModel, TextSimilarity

Exercises

  • Problem Set 1 (notebook) — Tokenization, TF-IDF, word similarity, sentiment, vocabulary
  • Problem Set 2 (notebook) — LSTM classification, NER, clustering, multi-task, BERT preview
  • Solutions — In exercises/solutions/ (notebooks and solutions.py for CI)

Diagrams (Mermaid)

  • NLP pipeline, text representation methods, LSTM architecture

Read Online

  • 10.1 Introduction — NLP fundamentals, preprocessing, TF-IDF, embeddings, sentiment intro
  • 10.2 Intermediate — Deep learning for text, classification, NER, clustering
  • 10.3 Advanced — Attention, transfer learning, production, capstone

Or try the code in the Playground.

How to Use This Chapter

Quick Start

Follow these steps to get coding in minutes.

1. Clone and install dependencies

git clone https://github.com/luigipascal/berta-chapters.git
cd berta-chapters
pip install -r requirements.txt

2. Navigate to the chapter

cd chapters/chapter-10-natural-language-processing-basics
pip install -r requirements.txt
python -m spacy download en_core_web_sm

3. Download NLTK data (in Python or a notebook)

import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')

4. Launch Jupyter

jupyter notebook notebooks/01_nlp_fundamentals.ipynb

GitHub Folder

All chapter materials live in: chapters/chapter-10-natural-language-processing-basics/


Created by Luigi Pascal Rondanini | Generated by Berta AI