🎁 New User? Get 20% off your first purchase with code NEWUSER20 Register Now β†’
Menu

Categories

Natural Language Processing with Python: Getting Started with NLTK and spaCy

Natural Language Processing with Python: Getting Started with NLTK and spaCy

Natural Language Processing (NLP) enables computers to understand, interpret, and generate human language. Python offers powerful NLP libraries that make it accessible to developers and data scientists. This guide covers the fundamentals using NLTK and spaCy.

Setting Up

pip install nltk spacy textblob
python -m spacy download en_core_web_sm
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords'); nltk.download('averaged_perceptron_tagger')"

Tokenization

import nltk
import spacy

text = "Natural Language Processing is fascinating. It helps computers understand human language."

# NLTK tokenization
from nltk.tokenize import word_tokenize, sent_tokenize
words = word_tokenize(text)
sentences = sent_tokenize(text)
print(f"Words: {words}")
print(f"Sentences: {sentences}")

# spaCy tokenization
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
tokens = [token.text for token in doc]
print(f"spaCy tokens: {tokens}")

Stop Words and Text Cleaning

from nltk.corpus import stopwords
import re

stop_words = set(stopwords.words("english"))

def clean_text(text):
    text = text.lower()
    text = re.sub(r"[^a-zA-Z\s]", "", text)
    words = word_tokenize(text)
    filtered = [w for w in words if w not in stop_words]
    return " ".join(filtered)

Part-of-Speech Tagging

# spaCy POS tagging
doc = nlp("The quick brown fox jumps over the lazy dog")
for token in doc:
    print(f"{token.text}: {token.pos_} ({token.tag_})")

# Output:
# The: DET (DT)
# quick: ADJ (JJ)
# brown: ADJ (JJ)
# fox: NOUN (NN)
# jumps: VERB (VBZ)
# ...

Named Entity Recognition

doc = nlp("Apple Inc. was founded by Steve Jobs in Cupertino, California in 1976.")

for ent in doc.ents:
    print(f"{ent.text}: {ent.label_}")

# Output:
# Apple Inc.: ORG
# Steve Jobs: PERSON
# Cupertino: GPE
# California: GPE
# 1976: DATE

Sentiment Analysis

from textblob import TextBlob

texts = [
    "This product is amazing and works perfectly!",
    "Terrible experience, I want a refund.",
    "The service was okay, nothing special."
]

for text in texts:
    blob = TextBlob(text)
    sentiment = blob.sentiment
    print(f"Text: {text}")
    print(f"  Polarity: {sentiment.polarity:.2f} (-1 to 1)")
    print(f"  Subjectivity: {sentiment.subjectivity:.2f} (0 to 1)")

Text Similarity

doc1 = nlp("I love programming in Python")
doc2 = nlp("Python programming is my favorite hobby")
doc3 = nlp("The weather is sunny today")

print(f"doc1 vs doc2: {doc1.similarity(doc2):.4f}")
print(f"doc1 vs doc3: {doc1.similarity(doc3):.4f}")

TF-IDF for Text Classification

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split

texts = ["great product love it", "terrible waste of money", ...]
labels = [1, 0, ...]  # 1 = positive, 0 = negative

vectorizer = TfidfVectorizer(max_features=5000)
X = vectorizer.fit_transform(texts)
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.2)

classifier = MultinomialNB()
classifier.fit(X_train, y_train)
accuracy = classifier.score(X_test, y_test)
print(f"Accuracy: {accuracy:.2f}")

Next Steps

  1. Explore Hugging Face Transformers for state-of-the-art NLP
  2. Learn about word embeddings (Word2Vec, GloVe, BERT)
  3. Practice with real datasets from Kaggle
  4. Build projects: chatbot, text summarizer, sentiment dashboard

NLP is one of the most exciting and practical areas of artificial intelligence. Start with these fundamentals, build small projects, and gradually explore more advanced techniques with transformers and large language models.

Share this article:
Bas van den Berg
About the Author

Bas van den Berg

IT Administrator, Security Architect, Infrastructure Security Specialist, Technical Author

Bas van den Berg is an experienced IT Administrator and Security Architect specializing in the design, protection, and long-term operation of secure IT infrastructures.

With a strong background in system administration and cybersecurity, he has worked extensively with enterprise environments, focusing on access control...

IT Administration Security Architecture Network Security System Hardening Access Control

Stay Updated

Subscribe to our newsletter for the latest tutorials, tips, and exclusive offers.