How to Create and Manage Virtual Environments in Python: A Complete Guide to venv, pipenv, and conda
Introduction
Virtual environments are one of the most crucial concepts in Python development, yet they're often overlooked by beginners. If you've ever encountered the dreaded "it works on my machine" problem or struggled with conflicting package versions across different projects, virtual environments are your solution.
A virtual environment is an isolated Python environment that allows you to install packages and dependencies specific to a particular project without affecting your system-wide Python installation or other projects. Think of it as creating a separate workspace for each of your Python projects, complete with its own set of tools and libraries.
In this comprehensive guide, we'll explore three popular tools for managing virtual environments: venv (Python's built-in solution), pipenv (a higher-level tool that combines pip and virtualenv), and conda (Anaconda's package and environment manager). By the end of this article, you'll understand when and how to use each tool, complete with practical project examples.
Why Virtual Environments Matter
Before diving into the tools, let's understand why virtual environments are essential:
Dependency Isolation
Different projects often require different versions of the same package. Without virtual environments, you might install Django 3.2 for one project, only to find that another project requires Django 4.1. Virtual environments solve this by keeping each project's dependencies separate.System Protection
Installing packages globally can potentially break system tools that depend on specific Python packages. Virtual environments protect your system Python installation from modifications.Reproducible Development
Virtual environments make it easier to share your project with others or deploy it to production, ensuring that everyone works with the same package versions.Clean Project Management
Each project gets its own clean slate, making it easier to track exactly which packages your project needs.Python's Built-in venv Module
The venv module is Python's standard library solution for creating virtual environments. It's been included with Python since version 3.3, making it the most accessible option for most developers.
Installing and Setting Up venv
Since venv comes with Python 3.3+, you likely already have it installed. You can verify this by running:
`bash
python -m venv --help
`
If you're using an older version of Python or a system where venv isn't available, you can install virtualenv as an alternative:
`bash
pip install virtualenv
`
Creating Your First Virtual Environment
Creating a virtual environment with venv is straightforward:
`bash
Create a virtual environment named 'myproject_env'
python -m venv myproject_envOn some systems, you might need to use python3
python3 -m venv myproject_env`This command creates a new directory called myproject_env containing:
- bin/ (or Scripts/ on Windows): Executable files, including the Python interpreter
- lib/: Python packages and dependencies
- pyvenv.cfg: Configuration file with environment settings
Activating and Deactivating Virtual Environments
To start using your virtual environment, you need to activate it:
`bash
On Linux/macOS
source myproject_env/bin/activateOn Windows (Command Prompt)
myproject_env\Scripts\activateOn Windows (PowerShell)
myproject_env\Scripts\Activate.ps1`Once activated, your command prompt will show the environment name in parentheses:
`bash
(myproject_env) user@computer:~/project$
`
To deactivate the environment:
`bash
deactivate
`
Project Example: Building a Web Scraper with venv
Let's create a practical project using venv. We'll build a simple web scraper that extracts headlines from a news website.
Step 1: Set up the project structure
`bash
mkdir news_scraper
cd news_scraper
python -m venv scraper_env
source scraper_env/bin/activate # On Windows: scraper_env\Scripts\activate
`
Step 2: Install required packages
`bash
pip install requests beautifulsoup4 lxml
`
Step 3: Create the scraper script
Create a file called scraper.py:
`python
import requests
from bs4 import BeautifulSoup
import json
from datetime import datetime
class NewsScraper: def __init__(self, base_url): self.base_url = base_url self.session = requests.Session() self.session.headers.update({ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36' }) def scrape_headlines(self, selector): """ Scrape headlines using CSS selector """ try: response = self.session.get(self.base_url) response.raise_for_status() soup = BeautifulSoup(response.content, 'html.parser') headlines = soup.select(selector) return [headline.get_text().strip() for headline in headlines] except requests.RequestException as e: print(f"Error fetching data: {e}") return [] def save_headlines(self, headlines, filename=None): """ Save headlines to JSON file """ if not filename: timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") filename = f"headlines_{timestamp}.json" data = { 'timestamp': datetime.now().isoformat(), 'source': self.base_url, 'headlines': headlines, 'count': len(headlines) } with open(filename, 'w', encoding='utf-8') as f: json.dump(data, f, indent=2, ensure_ascii=False) print(f"Saved {len(headlines)} headlines to {filename}")
Example usage
if __name__ == "__main__": # Example with BBC News (adjust selector as needed) scraper = NewsScraper("https://www.bbc.com/news") headlines = scraper.scrape_headlines("h3") if headlines: scraper.save_headlines(headlines) print(f"\nFound {len(headlines)} headlines:") for i, headline in enumerate(headlines[:5], 1): print(f"{i}. {headline}") else: print("No headlines found")`Step 4: Create requirements file
`bash
pip freeze > requirements.txt
`
Your requirements.txt will look like:
`
beautifulsoup4==4.12.2
certifi==2023.7.22
charset-normalizer==3.3.0
idna==3.4
lxml==4.9.3
requests==2.31.0
soupsieve==2.5
urllib3==2.0.6
`
Step 5: Test the scraper
`bash
python scraper.py
`
This project demonstrates how venv provides a clean environment for your dependencies. Anyone can recreate your environment by running:
`bash
python -m venv scraper_env
source scraper_env/bin/activate
pip install -r requirements.txt
`
Advanced venv Usage
Custom Python versions:
`bash
Use specific Python version
/usr/bin/python3.9 -m venv myenv_39`Without pip:
`bash
Create environment without pip
python -m venv myenv --without-pip`System site packages:
`bash
Access system-wide packages
python -m venv myenv --system-site-packages`Pros and Cons of venv
Pros: - Built into Python 3.3+ - Lightweight and fast - Simple and straightforward - No external dependencies - Good for basic project isolation
Cons: - Manual dependency management - No built-in support for different Python versions - Requires separate tools for advanced features - Can become cumbersome for complex workflows
Pipenv: Python Development Workflow for Humans
Pipenv aims to bring the best of all packaging worlds to Python. It automatically creates and manages virtual environments while providing advanced dependency management through Pipfile and Pipfile.lock.
Installing Pipenv
Install pipenv using pip:
`bash
pip install pipenv
`
Or using your system package manager:
`bash
On macOS with Homebrew
brew install pipenvOn Ubuntu/Debian
sudo apt install pipenv`Basic Pipenv Commands
Pipenv simplifies virtual environment management with intuitive commands:
`bash
Install a package and create Pipfile
pipenv install requestsInstall development dependencies
pipenv install pytest --devInstall from requirements.txt
pipenv install -r requirements.txtActivate shell
pipenv shellRun commands in environment
pipenv run python script.pyInstall all dependencies from Pipfile
pipenv installInstall including dev dependencies
pipenv install --devGenerate requirements.txt
pipenv requirements > requirements.txt`Understanding Pipfile and Pipfile.lock
Pipenv uses two files for dependency management:
Pipfile - Human-readable dependency specification:
`toml
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"
[packages] requests = "*" flask = ">=1.0" django = "~=3.2.0"
[dev-packages] pytest = "*" black = "*" flake8 = "*"
[requires]
python_version = "3.9"
`
Pipfile.lock - Exact versions for reproducible builds:
`json
{
"_meta": {
"hash": {
"sha256": "..."
},
"pipfile-spec": 6,
"requires": {
"python_version": "3.9"
}
},
"default": {
"requests": {
"hashes": ["..."],
"index": "pypi",
"version": "==2.31.0"
}
}
}
`
Project Example: Building a REST API with Flask using Pipenv
Let's create a REST API for a simple task management system using Flask and Pipenv.
Step 1: Initialize the project
`bash
mkdir task_api
cd task_api
pipenv install flask flask-sqlalchemy flask-migrate
pipenv install --dev pytest flask-testing
`
This automatically creates your virtual environment and Pipfile.
Step 2: Create the Flask application
Create app.py:
`python
from flask import Flask, request, jsonify
from flask_sqlalchemy import SQLAlchemy
from flask_migrate import Migrate
from datetime import datetime
import os
app = Flask(__name__)
Configuration
app.config['SQLALCHEMY_DATABASE_URI'] = os.environ.get('DATABASE_URL', 'sqlite:///tasks.db') app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = Falsedb = SQLAlchemy(app) migrate = Migrate(app, db)
Models
class Task(db.Model): id = db.Column(db.Integer, primary_key=True) title = db.Column(db.String(100), nullable=False) description = db.Column(db.Text) completed = db.Column(db.Boolean, default=False) created_at = db.Column(db.DateTime, default=datetime.utcnow) updated_at = db.Column(db.DateTime, default=datetime.utcnow, onupdate=datetime.utcnow) def to_dict(self): return { 'id': self.id, 'title': self.title, 'description': self.description, 'completed': self.completed, 'created_at': self.created_at.isoformat(), 'updated_at': self.updated_at.isoformat() }Routes
@app.route('/api/tasks', methods=['GET']) def get_tasks(): tasks = Task.query.all() return jsonify([task.to_dict() for task in tasks])@app.route('/api/tasks', methods=['POST']) def create_task(): data = request.get_json() if not data or 'title' not in data: return jsonify({'error': 'Title is required'}), 400 task = Task( title=data['title'], description=data.get('description', '') ) db.session.add(task) db.session.commit() return jsonify(task.to_dict()), 201
@app.route('/api/tasks/
@app.route('/api/tasks/
@app.route('/api/tasks/
@app.route('/api/health', methods=['GET']) def health_check(): return jsonify({'status': 'healthy', 'timestamp': datetime.utcnow().isoformat()})
if __name__ == '__main__':
with app.app_context():
db.create_all()
app.run(debug=True)
`
Step 3: Create test files
Create tests/test_api.py:
`python
import unittest
import json
from app import app, db, Task
class TaskAPITestCase(unittest.TestCase): def setUp(self): app.config['TESTING'] = True app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///:memory:' self.app = app.test_client() with app.app_context(): db.create_all() def tearDown(self): with app.app_context(): db.session.remove() db.drop_all() def test_health_check(self): response = self.app.get('/api/health') self.assertEqual(response.status_code, 200) data = json.loads(response.data) self.assertEqual(data['status'], 'healthy') def test_create_task(self): task_data = { 'title': 'Test Task', 'description': 'This is a test task' } response = self.app.post('/api/tasks', data=json.dumps(task_data), content_type='application/json') self.assertEqual(response.status_code, 201) data = json.loads(response.data) self.assertEqual(data['title'], 'Test Task') self.assertEqual(data['completed'], False) def test_get_tasks(self): # Create a task first with app.app_context(): task = Task(title='Test Task', description='Test Description') db.session.add(task) db.session.commit() response = self.app.get('/api/tasks') self.assertEqual(response.status_code, 200) data = json.loads(response.data) self.assertEqual(len(data), 1) self.assertEqual(data[0]['title'], 'Test Task')
if __name__ == '__main__':
unittest.main()
`
Step 4: Create a startup script
Create run.py:
`python
from app import app, db
if __name__ == '__main__':
with app.app_context():
db.create_all()
print("Starting Task API server...")
print("API endpoints:")
print(" GET /api/tasks")
print(" POST /api/tasks")
print(" GET /api/tasks/`
Step 5: Run the application
`bash
Activate pipenv shell
pipenv shellRun the application
python run.pyOr run directly with pipenv
pipenv run python run.pyRun tests
pipenv run python -m pytest tests/`Step 6: Test the API
`bash
Create a task
curl -X POST http://localhost:5000/api/tasks \ -H "Content-Type: application/json" \ -d '{"title": "Learn Pipenv", "description": "Master virtual environments"}'Get all tasks
curl http://localhost:5000/api/tasksUpdate a task
curl -X PUT http://localhost:5000/api/tasks/1 \ -H "Content-Type: application/json" \ -d '{"completed": true}'`Advanced Pipenv Features
Environment variables:
Create a .env file:
`
DATABASE_URL=postgresql://user:pass@localhost/taskdb
SECRET_KEY=your-secret-key
DEBUG=True
`
Pipenv automatically loads these variables.
Scripts in Pipfile:
`toml
[scripts]
start = "python run.py"
test = "python -m pytest tests/"
lint = "flake8 app.py tests/"
format = "black app.py tests/"
`
Run scripts with:
`bash
pipenv run start
pipenv run test
`
Pros and Cons of Pipenv
Pros: - Combines pip and virtualenv functionality - Automatic virtual environment management - Advanced dependency resolution - Built-in support for environment variables - Pipfile is more readable than requirements.txt - Deterministic builds with Pipfile.lock
Cons: - Can be slower than pip for large projects - Additional dependency to install - Learning curve for teams used to pip/venv - Some compatibility issues with certain CI/CD systems
Conda: The Scientific Python Environment Manager
Conda is both a package manager and an environment management system that comes with the Anaconda and Miniconda distributions. It's particularly popular in data science and scientific computing communities.
Installing Conda
You have several options:
Anaconda (full distribution): - Download from https://www.anaconda.com/products/distribution - Includes 250+ pre-installed packages - Larger download (~500MB)
Miniconda (minimal installation): - Download from https://docs.conda.io/en/latest/miniconda.html - Minimal conda installation - Smaller download (~50MB)
Mambaforge (faster alternative): - Uses mamba as the default package manager - Faster dependency solving - Community-driven
Basic Conda Commands
`bash
Create new environment
conda create --name myenv python=3.9Create environment with packages
conda create --name dataenv python=3.9 numpy pandas matplotlibActivate environment
conda activate myenvDeactivate environment
conda deactivateList environments
conda env listInstall packages
conda install numpy scipy matplotlibInstall from conda-forge
conda install -c conda-forge seabornInstall with pip in conda environment
pip install some-packageExport environment
conda env export > environment.ymlCreate environment from file
conda env create -f environment.ymlRemove environment
conda env remove --name myenvUpdate conda
conda update condaSearch for packages
conda search numpy`Understanding environment.yml
Conda uses environment.yml files to define environments:
`yaml
name: data_analysis
channels:
- conda-forge
- defaults
dependencies:
- python=3.9
- numpy>=1.20
- pandas>=1.3
- matplotlib>=3.4
- seaborn>=0.11
- jupyter
- scikit-learn
- pip
- pip:
- streamlit
- plotly>=5.0
`
Project Example: Building a Data Analysis Dashboard with Conda
Let's create a comprehensive data analysis project using conda, featuring a Streamlit dashboard for exploring COVID-19 data.
Step 1: Set up the conda environment
Create environment.yml:
`yaml
name: covid_dashboard
channels:
- conda-forge
- defaults
dependencies:
- python=3.9
- pandas>=1.3
- numpy>=1.20
- matplotlib>=3.4
- seaborn>=0.11
- plotly>=5.0
- requests>=2.25
- jupyter
- scikit-learn
- pip
- pip:
- streamlit>=1.0
- altair>=4.0
`
Create the environment:
`bash
conda env create -f environment.yml
conda activate covid_dashboard
`
Step 2: Create data processing module
Create data_processor.py:
`python
import pandas as pd
import numpy as np
import requests
from datetime import datetime, timedelta
import os
class CovidDataProcessor: def __init__(self): self.base_url = "https://disease.sh/v3/covid-19" self.data_cache = {} self.cache_timeout = 3600 # 1 hour def fetch_global_data(self): """Fetch global COVID-19 statistics""" try: response = requests.get(f"{self.base_url}/all") response.raise_for_status() return response.json() except requests.RequestException as e: print(f"Error fetching global data: {e}") return None def fetch_countries_data(self): """Fetch COVID-19 data for all countries""" cache_key = "countries_data" # Check cache if self._is_cached(cache_key): return self.data_cache[cache_key]["data"] try: response = requests.get(f"{self.base_url}/countries") response.raise_for_status() data = response.json() # Cache the data self.data_cache[cache_key] = { "data": data, "timestamp": datetime.now() } return data except requests.RequestException as e: print(f"Error fetching countries data: {e}") return [] def fetch_historical_data(self, country="all", days=30): """Fetch historical data for a country""" cache_key = f"historical_{country}_{days}" if self._is_cached(cache_key): return self.data_cache[cache_key]["data"] try: if country.lower() == "all": url = f"{self.base_url}/historical/all?lastdays={days}" else: url = f"{self.base_url}/historical/{country}?lastdays={days}" response = requests.get(url) response.raise_for_status() data = response.json() self.data_cache[cache_key] = { "data": data, "timestamp": datetime.now() } return data except requests.RequestException as e: print(f"Error fetching historical data: {e}") return None def process_countries_dataframe(self): """Convert countries data to pandas DataFrame""" data = self.fetch_countries_data() if not data: return pd.DataFrame() df = pd.DataFrame(data) # Clean and process data df['casesPerMillion'] = df['casesPerOneMillion'].fillna(0) df['deathsPerMillion'] = df['deathsPerOneMillion'].fillna(0) df['testsPerMillion'] = df['testsPerOneMillion'].fillna(0) # Calculate additional metrics df['mortalityRate'] = (df['deaths'] / df['cases'] * 100).fillna(0) df['recoveryRate'] = (df['recovered'] / df['cases'] * 100).fillna(0) return df def process_historical_dataframe(self, country="all"): """Convert historical data to pandas DataFrame""" data = self.fetch_historical_data(country) if not data: return pd.DataFrame() if country.lower() == "all": cases = data.get('cases', {}) deaths = data.get('deaths', {}) recovered = data.get('recovered', {}) else: timeline = data.get('timeline', {}) cases = timeline.get('cases', {}) deaths = timeline.get('deaths', {}) recovered = timeline.get('recovered', {}) # Create DataFrame dates = list(cases.keys()) df = pd.DataFrame({ 'date': dates, 'cases': [cases[date] for date in dates], 'deaths': [deaths.get(date, 0) for date in dates], 'recovered': [recovered.get(date, 0) for date in dates] }) df['date'] = pd.to_datetime(df['date']) df = df.sort_values('date') # Calculate daily changes df['daily_cases'] = df['cases'].diff().fillna(0) df['daily_deaths'] = df['deaths'].diff().fillna(0) df['daily_recovered'] = df['recovered'].diff().fillna(0) return df def _is_cached(self, key): """Check if data is cached and not expired""" if key not in self.data_cache: return False cached_time = self.data_cache[key]["timestamp"] return (datetime.now() - cached_time).seconds < self.cache_timeout
Utility functions for analysis
def calculate_growth_rate(series, periods=7): """Calculate growth rate over specified periods""" return ((series / series.shift(periods)) - 1) * 100def get_top_countries(df, metric, n=10):
"""Get top N countries by specified metric"""
return df.nlargest(n, metric)[['country', metric]]
`
Step 3: Create the Streamlit dashboard
Create dashboard.py:
`python
import streamlit as st
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import numpy as np
from data_processor import CovidDataProcessor, calculate_growth_rate, get_top_countries
Configure page
st.set_page_config( page_title="COVID-19 Dashboard", page_icon="š¦ ", layout="wide", initial_sidebar_state="expanded" )Initialize data processor
@st.cache_data(ttl=3600) # Cache for 1 hour def load_data(): processor = CovidDataProcessor() return processor@st.cache_data(ttl=3600) def get_global_stats(processor): return processor.fetch_global_data()
@st.cache_data(ttl=3600) def get_countries_df(processor): return processor.process_countries_dataframe()
@st.cache_data(ttl=3600) def get_historical_df(processor, country="all"): return processor.process_historical_dataframe(country)
def main(): st.title("š¦ COVID-19 Global Dashboard") st.markdown("Real-time COVID-19 statistics and analysis") # Load data processor = load_data() # Sidebar st.sidebar.header("Dashboard Controls") # Global statistics global_data = get_global_stats(processor) if global_data: st.header("š Global Statistics") col1, col2, col3, col4 = st.columns(4) with col1: st.metric( label="Total Cases", value=f"{global_data['cases']:,}", delta=f"+{global_data['todayCases']:,} today" ) with col2: st.metric( label="Total Deaths", value=f"{global_data['deaths']:,}", delta=f"+{global_data['todayDeaths']:,} today" ) with col3: st.metric( label="Total Recovered", value=f"{global_data['recovered']:,}", delta=f"+{global_data['todayRecovered']:,} today" ) with col4: mortality_rate = (global_data['deaths'] / global_data['cases']) * 100 st.metric( label="Mortality Rate", value=f"{mortality_rate:.2f}%" ) # Countries data countries_df = get_countries_df(processor) if not countries_df.empty: st.header("š Country Analysis") # Country selection countries = ['Global'] + sorted(countries_df['country'].tolist()) selected_country = st.sidebar.selectbox("Select Country", countries) # Metrics selection metric_options = { 'cases': 'Total Cases', 'deaths': 'Total Deaths', 'recovered': 'Total Recovered', 'casesPerMillion': 'Cases per Million', 'deathsPerMillion': 'Deaths per Million', 'mortalityRate': 'Mortality Rate (%)' } selected_metric = st.sidebar.selectbox( "Select Metric for Analysis", list(metric_options.keys()), format_func=lambda x: metric_options[x] ) # Top countries chart st.subheader(f"Top 15 Countries by {metric_options[selected_metric]}") top_countries = get_top_countries(countries_df, selected_metric, 15) fig_bar = px.bar( top_countries, x=selected_metric, y='country', orientation='h', title=f"Top 15 Countries - {metric_options[selected_metric]}", color=selected_metric, color_continuous_scale='Viridis' ) fig_bar.update_layout(height=500) st.plotly_chart(fig_bar, use_container_width=True) # Historical trends st.header("š Historical Trends") if selected_country == 'Global': historical_df = get_historical_df(processor, "all") else: historical_df = get_historical_df(processor, selected_country) if not historical_df.empty: # Create subplots fig_trends = make_subplots( rows=2, cols=2, subplot_titles=('Cumulative Cases', 'Daily New Cases', 'Cumulative Deaths', 'Daily New Deaths'), specs=[[{"secondary_y": False}, {"secondary_y": False}], [{"secondary_y": False}, {"secondary_y": False}]] ) # Add traces fig_trends.add_trace( go.Scatter(x=historical_df['date'], y=historical_df['cases'], name='Cumulative Cases', line=dict(color='blue')), row=1, col=1 ) fig_trends.add_trace( go.Scatter(x=historical_df['date'], y=historical_df['daily_cases'], name='Daily Cases', line=dict(color='lightblue')), row=1, col=2 ) fig_trends.add_trace( go.Scatter(x=historical_df['date'], y=historical_df['deaths'], name='Cumulative Deaths', line=dict(color='red')), row=2, col=1 ) fig_trends.add_trace( go.Scatter(x=historical_df['date'], y=historical_df['daily_deaths'], name='Daily Deaths', line=dict(color='lightcoral')), row=2, col=2 ) fig_trends.update_layout(height=600, showlegend=False) st.plotly_chart(fig_trends, use_container_width=True) # Growth rate analysis if len(historical_df) >= 14: st.subheader("š Growth Rate Analysis (7-day)") historical_df['cases_growth_rate'] = calculate_growth_rate(historical_df['cases'], 7) historical_df['deaths_growth_rate'] = calculate_growth_rate(historical_df['deaths'], 7) fig_growth = go.Figure() fig_growth.add_trace( go.Scatter( x=historical_df['date'], y=historical_df['cases_growth_rate'], name='Cases Growth Rate (%)', line=dict(color='blue') ) ) fig_growth.add_trace( go.Scatter( x=historical_df['date'], y=historical_df['deaths_growth_rate'], name='Deaths Growth Rate (%)', line=dict(color='red') ) ) fig_growth.update_layout( title="7-Day Growth Rate Trends", xaxis_title="Date", yaxis_title="Growth Rate (%)", height=400 ) st.plotly_chart(fig_growth, use_container_width=True) # Data table st.header("š Raw Data") if st.checkbox("Show Countries Data"): st.dataframe( countries_df[['country', 'cases', 'deaths', 'recovered', 'casesPerMillion', 'deathsPerMillion', 'mortalityRate']].head(20) ) if st.checkbox("Show Historical Data") and not historical_df.empty: st.dataframe(historical_df.tail(14))
if __name__ == "__main__":
main()
`
Step 4: Create analysis notebooks
Create analysis.ipynb:
`python
Jupyter notebook for detailed analysis
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from data_processor import CovidDataProcessor import warnings warnings.filterwarnings('ignore')Set style
plt.style.use('seaborn-v0_8') sns.set_palette("husl")Initialize processor
processor = CovidDataProcessor()Load data
print("Loading COVID-19 data...") countries_df = processor.process_countries_dataframe() global_historical = processor.process_historical_dataframe("all")print(f"Loaded data for {len(countries_df)} countries") print(f"Historical data covers {len(global_historical)} days")
Basic statistics
print("\n=== Global Statistics ===") print(f"Total cases: {countries_df['cases'].sum():,}") print(f"Total deaths: {countries_df['deaths'].sum():,}") print(f"Total recovered: {countries_df['recovered'].sum():,}")Correlation analysis
numeric_columns = ['cases', 'deaths', 'recovered', 'casesPerMillion', 'deathsPerMillion', 'testsPerMillion', 'mortalityRate']correlation_matrix = countries_df[numeric_columns].corr()
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0)
plt.title('COVID-19 Metrics Correlation Matrix')
plt.tight_layout()
plt.show()
`
Step 5: Create startup scripts
Create run_dashboard.py:
`python
import subprocess
import sys
import os
def main(): print("š¦ Starting COVID-19 Dashboard...") print("Make sure you have activated the conda environment:") print("conda activate covid_dashboard") print("\nStarting Streamlit server...") try: subprocess.run([ sys.executable, "-m", "streamlit", "run", "dashboard.py", "--server.port", "8501", "--server.address", "localhost" ]) except KeyboardInterrupt: print("\nš Dashboard stopped by user") except Exception as e: print(f"ā Error starting dashboard: {e}")
if __name__ == "__main__":
main()
`
Step 6: Run the project
`bash
Activate environment
conda activate covid_dashboardRun the dashboard
python run_dashboard.pyOr run directly
streamlit run dashboard.pyRun Jupyter notebook for analysis
jupyter notebook analysis.ipynb`Advanced Conda Features
Multiple channels:
`bash
Add conda-forge channel
conda config --add channels conda-forgeInstall from specific channel
conda install -c bioconda biopython`Environment management:
`bash
Clone environment
conda create --name newenv --clone oldenvUpdate all packages
conda update --allClean up
conda clean --all`Conda-pack for deployment:
`bash
Install conda-pack
conda install conda-packPack environment
conda pack -n myenv -o myenv.tar.gz`Pros and Cons of Conda
Pros: - Excellent for data science and scientific computing - Manages both Python and system dependencies - Cross-platform compatibility - Large ecosystem of scientific packages - Can handle complex dependency conflicts - Supports multiple programming languages
Cons: - Larger disk space requirements - Can be slower than pip for pure Python packages - Learning curve for developers used to pip - Sometimes conflicts between conda and pip packages - Default solver can be slow (though mamba solves this)
Comparing Virtual Environment Tools
| Feature | venv | pipenv | conda |
|---------|------|--------|-------|
| Installation | Built-in Python 3.3+ | pip install pipenv | Separate distribution |
| Dependency Management | requirements.txt | Pipfile/Pipfile.lock | environment.yml |
| Package Sources | PyPI only | PyPI | Multiple channels |
| System Dependencies | No | No | Yes |
| Multiple Python Versions | Manual | Yes | Yes |
| Reproducible Builds | Manual | Automatic | Automatic |
| Learning Curve | Low | Medium | Medium-High |
| Performance | Fast | Medium | Medium-Fast |
| Ecosystem | Universal | Python-focused | Data Science focused |
| Disk Space | Minimal | Small | Large |
Best Practices for Virtual Environment Management
1. Project Structure
Organize your projects consistently:`
my_project/
āāā src/
āāā tests/
āāā docs/
āāā requirements.txt (or Pipfile/environment.yml)
āāā .env
āāā .gitignore
āāā README.md
`2. Naming Conventions
- Use descriptive environment names - Include project name or purpose - Consider version suffixes for multiple versions3. Documentation
Always document your environment setup:`markdown
Setup
Using venv
`bash
python -m venv env
source env/bin/activate # On Windows: env\Scripts\activate
pip install -r requirements.txt
`Using pipenv
`bash
pipenv install
pipenv shell
`Using conda
`bash
conda env create -f environment.yml
conda activate myproject
`
`4. Version Control
Include in version control: -requirements.txt
- Pipfile and Pipfile.lock
- environment.yml
- .env.example (template for environment variables)Exclude from version control:
- Virtual environment directories
- .env (actual environment variables)
- __pycache__/
- .pytest_cache/
5. Environment Variables
Use environment variables for configuration:`python
import os
from dotenv import load_dotenvload_dotenv()
DATABASE_URL = os.getenv('DATABASE_URL', 'sqlite:///default.db')
SECRET_KEY = os.getenv('SECRET_KEY', 'dev-key')
DEBUG = os.getenv('DEBUG', 'False').lower() == 'true'
`
6. Testing Across Environments
Test your application in clean environments:`bash
Test with fresh environment
python -m venv test_env source test_env/bin/activate pip install -r requirements.txt python -m pytest deactivate rm -rf test_env`Troubleshooting Common Issues
Virtual Environment Not Activating
Problem: Environment doesn't activate properly Solutions:`bash
Check if environment exists
ls -la myenv/Try absolute path
source /full/path/to/myenv/bin/activateOn Windows, check execution policy
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser`Package Installation Failures
Problem: Packages fail to install Solutions:`bash
Upgrade pip
python -m pip install --upgrade pipClear pip cache
pip cache purgeInstall with verbose output
pip install -v package_nameUse different index
pip install -i https://pypi.org/simple/ package_name`Conflicting Dependencies
Problem: Dependency conflicts between packages Solutions:`bash
With pip-tools
pip install pip-tools pip-compile requirements.inWith pipenv
pipenv install --skip-lock pipenv lock --clearWith conda
conda update --all conda clean --all`Environment Size Issues
Problem: Environment takes too much disk space Solutions:`bash
Clean conda
conda clean --allRemove unused packages
pip uninstall package_nameUse conda-pack for deployment
conda pack -n myenv`Conclusion
Virtual environments are essential for professional Python development. Each tool we've covered serves different needs:
- Use venv when: You want simplicity, are working on basic Python projects, or need the lightest solution - Use pipenv when: You want modern dependency management, work primarily with Python web applications, or need deterministic builds - Use conda when: You're doing data science, need system-level dependencies, or work with scientific computing packages
The key is understanding your project requirements and choosing the right tool for the job. Many developers use different tools for different projects, and that's perfectly fine.
Remember these essential practices: 1. Always use virtual environments for projects 2. Document your environment setup clearly 3. Keep dependency files in version control 4. Test in clean environments before deployment 5. Choose the tool that best fits your workflow and team
By mastering virtual environments, you'll avoid dependency conflicts, create reproducible development environments, and maintain cleaner, more professional Python projects. Whether you're building web applications, analyzing data, or creating command-line tools, virtual environments will make your development process smoother and more reliable.
Start with the tool that feels most comfortable for your current project, and gradually explore the others as your needs evolve. The investment in learning proper virtual environment management will pay dividends throughout your Python development career.