Natural Language Processing with Python: Getting…

Natural Language Processing with Python: Getting Started with NLTK and spaCy

Bas van den Berg | March 18, 2026 | Updated: April 20, 2026 | 13 min read | 118 views

Natural Language Processing (NLP) enables computers to understand, interpret, and generate human language. Python offers powerful NLP libraries that make it accessible to developers and data scientists. This guide covers the fundamentals using NLTK and spaCy.

Setting Up

pip install nltk spacy textblob
python -m spacy download en_core_web_sm
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords'); nltk.download('averaged_perceptron_tagger')"

Tokenization

import nltk
import spacy

text = "Natural Language Processing is fascinating. It helps computers understand human language."

# NLTK tokenization
from nltk.tokenize import word_tokenize, sent_tokenize
words = word_tokenize(text)
sentences = sent_tokenize(text)
print(f"Words: {words}")
print(f"Sentences: {sentences}")

# spaCy tokenization
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
tokens = [token.text for token in doc]
print(f"spaCy tokens: {tokens}")

Stop Words and Text Cleaning

from nltk.corpus import stopwords
import re

stop_words = set(stopwords.words("english"))

def clean_text(text):
    text = text.lower()
    text = re.sub(r"[^a-zA-Z\s]", "", text)
    words = word_tokenize(text)
    filtered = [w for w in words if w not in stop_words]
    return " ".join(filtered)

Part-of-Speech Tagging

# spaCy POS tagging
doc = nlp("The quick brown fox jumps over the lazy dog")
for token in doc:
    print(f"{token.text}: {token.pos_} ({token.tag_})")

# Output:
# The: DET (DT)
# quick: ADJ (JJ)
# brown: ADJ (JJ)
# fox: NOUN (NN)
# jumps: VERB (VBZ)
# ...

Named Entity Recognition

doc = nlp("Apple Inc. was founded by Steve Jobs in Cupertino, California in 1976.")

for ent in doc.ents:
    print(f"{ent.text}: {ent.label_}")

# Output:
# Apple Inc.: ORG
# Steve Jobs: PERSON
# Cupertino: GPE
# California: GPE
# 1976: DATE

Sentiment Analysis

from textblob import TextBlob

texts = [
    "This product is amazing and works perfectly!",
    "Terrible experience, I want a refund.",
    "The service was okay, nothing special."
]

for text in texts:
    blob = TextBlob(text)
    sentiment = blob.sentiment
    print(f"Text: {text}")
    print(f"  Polarity: {sentiment.polarity:.2f} (-1 to 1)")
    print(f"  Subjectivity: {sentiment.subjectivity:.2f} (0 to 1)")

Text Similarity

doc1 = nlp("I love programming in Python")
doc2 = nlp("Python programming is my favorite hobby")
doc3 = nlp("The weather is sunny today")

print(f"doc1 vs doc2: {doc1.similarity(doc2):.4f}")
print(f"doc1 vs doc3: {doc1.similarity(doc3):.4f}")

TF-IDF for Text Classification

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split

texts = ["great product love it", "terrible waste of money", ...]
labels = [1, 0, ...]  # 1 = positive, 0 = negative

vectorizer = TfidfVectorizer(max_features=5000)
X = vectorizer.fit_transform(texts)
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.2)

classifier = MultinomialNB()
classifier.fit(X_train, y_train)
accuracy = classifier.score(X_test, y_test)
print(f"Accuracy: {accuracy:.2f}")

Next Steps

Explore Hugging Face Transformers for state-of-the-art NLP
Learn about word embeddings (Word2Vec, GloVe, BERT)
Practice with real datasets from Kaggle
Build projects: chatbot, text summarizer, sentiment dashboard

NLP is one of the most exciting and practical areas of artificial intelligence. Start with these fundamentals, build small projects, and gradually explore more advanced techniques with transformers and large language models.

Categories

Natural Language Processing with Python: Getting Started with NLTK and spaCy

Setting Up

Tokenization

Stop Words and Text Cleaning

Part-of-Speech Tagging

Named Entity Recognition

Sentiment Analysis

Text Similarity

TF-IDF for Text Classification

Next Steps

Bas van den Berg

Stay Updated

Categories

Setting Up

Tokenization

Stop Words and Text Cleaning

Part-of-Speech Tagging

Named Entity Recognition

Sentiment Analysis

Text Similarity

TF-IDF for Text Classification

Next Steps

Bas van den Berg

Related Articles

AI MCP Server Configuration: Complete Cheat Sheet & Setup Guide (2026)

Ansible Automation Complete Guide: From Zero to Production Infrastructure (2026)

n8n Workflow Automation Complete Guide: Build Production Automations (2026)

Stay Updated