Linux is the preferred operating system for machine learning and artificial intelligence development. Its stability, performance, and compatibility with GPU drivers make it the top choice for data scientists and ML engineers. This guide walks you through setting up a complete ML environment from scratch.
Why Linux for Machine Learning?
- GPU Support: Best NVIDIA CUDA and driver support
- Package Management: Easy installation of scientific computing libraries
- Performance: Lower overhead than Windows for computation-heavy tasks
- Docker: Native container support for reproducible environments
- Server Deployment: Most ML production servers run Linux
Setting Up Python for ML
# Install Python and pip
sudo apt update
sudo apt install python3 python3-pip python3-venv
# Create a virtual environment
python3 -m venv ~/ml-env
source ~/ml-env/bin/activate
# Install core ML libraries
pip install numpy pandas matplotlib scikit-learn
pip install jupyter notebook
pip install seaborn plotly
Installing TensorFlow
# CPU version
pip install tensorflow
# GPU version (requires NVIDIA drivers and CUDA)
pip install tensorflow[and-cuda]
# Verify installation
python3 -c "import tensorflow as tf; print(tf.__version__); print(tf.config.list_physical_devices('GPU'))"
Installing PyTorch
# CPU version
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# GPU version (CUDA 12.1)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# Verify
python3 -c "import torch; print(torch.__version__); print(torch.cuda.is_available())"
NVIDIA GPU Setup
# Install NVIDIA drivers
sudo apt install nvidia-driver-550
# Install CUDA Toolkit
sudo apt install nvidia-cuda-toolkit
# Verify GPU detection
nvidia-smi
Jupyter Notebook Setup
# Install and configure
pip install jupyterlab
# Generate config
jupyter notebook --generate-config
# Start Jupyter
jupyter lab --ip=0.0.0.0 --port=8888 --no-browser
Your First ML Project: Classification
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
# Load dataset
from sklearn.datasets import load_iris
data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
# Evaluate
predictions = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, predictions):.2f}")
print(classification_report(y_test, predictions, target_names=data.target_names))
Essential ML Libraries Checklist
- NumPy: Numerical computing and array operations
- Pandas: Data manipulation and analysis
- Scikit-learn: Classical ML algorithms
- TensorFlow/PyTorch: Deep learning frameworks
- Matplotlib/Seaborn: Data visualization
- Jupyter: Interactive notebook environment
- XGBoost: Gradient boosting for structured data
- Hugging Face: Pre-trained NLP models
With this setup, you have everything you need to start your machine learning journey on Linux. Begin with simple projects using scikit-learn, then progress to deep learning with TensorFlow or PyTorch as your skills grow.