DevOps Best Practices: From Continuous Integration to Continuous Delivery
Table of Contents
1. [Introduction to DevOps](#introduction) 2. [Continuous Integration (CI) Best Practices](#continuous-integration) 3. [Continuous Delivery (CD) Implementation](#continuous-delivery) 4. [Automation Strategies](#automation) 5. [Infrastructure as Code (IaC)](#infrastructure-as-code) 6. [Monitoring and Observability](#monitoring) 7. [DevOps Culture and Team Dynamics](#culture) 8. [Tool Implementation Examples](#tool-examples) 9. [Advanced Workflows](#advanced-workflows) 10. [Conclusion](#conclusion)Introduction to DevOps {#introduction}
DevOps represents a cultural and technical transformation that bridges the gap between development and operations teams. By implementing DevOps best practices, organizations can achieve faster delivery cycles, improved software quality, and enhanced collaboration across teams.
The core principles of DevOps include: - Collaboration: Breaking down silos between development and operations - Automation: Reducing manual processes and human error - Continuous Integration: Frequently merging code changes - Continuous Delivery: Maintaining software in a deployable state - Monitoring: Gaining visibility into system performance and user experience - Feedback Loops: Learning from failures and continuously improving
This comprehensive guide will walk you through implementing these principles with practical examples and proven strategies.
Continuous Integration (CI) Best Practices {#continuous-integration}
Understanding Continuous Integration
Continuous Integration is the practice of frequently integrating code changes into a shared repository. Each integration is automatically verified through builds and tests, enabling teams to detect problems early and resolve them quickly.
Core CI Principles
1. Commit Early and Often - Make small, frequent commits rather than large, infrequent ones - Each commit should represent a logical unit of work - Write meaningful commit messages that explain the "why" behind changes
2. Automated Testing Strategy
`yaml
Example testing pyramid structure
Unit Tests: 70% # Fast, isolated tests Integration Tests: 20% # Component interaction tests End-to-End Tests: 10% # Full system tests`3. Build Automation Every code commit should trigger an automated build process that: - Compiles the code - Runs automated tests - Performs static code analysis - Generates artifacts
CI Pipeline Implementation
Step 1: Repository Setup
`bash
Initialize repository with proper structure
project-root/ ├── src/ ├── tests/ ├── .github/workflows/ # For GitHub Actions ├── Jenkinsfile # For Jenkins ├── Dockerfile ├── docker-compose.yml └── README.md`Step 2: Automated Testing Configuration
`javascript
// Example Jest configuration for Node.js
module.exports = {
testEnvironment: 'node',
collectCoverage: true,
coverageThreshold: {
global: {
branches: 80,
functions: 80,
lines: 80,
statements: 80
}
},
testMatch: ['/__tests__//.js', '/?(.)+(spec|test).js']
};
`
Step 3: Quality Gates Implement quality gates that prevent poor code from advancing: - Code coverage thresholds - Static analysis rules - Security vulnerability scans - Performance benchmarks
GitHub Actions CI Example
`yaml
name: CI Pipeline
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main ]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
node-version: [14.x, 16.x, 18.x]
steps:
- uses: actions/checkout@v3
- name: Setup Node.js
uses: actions/setup-node@v3
with:
node-version: $#
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run linting
run: npm run lint
- name: Run tests
run: npm test
- name: Upload coverage reports
uses: codecov/codecov-action@v3
with:
file: ./coverage/lcov.info
- name: Build application
run: npm run build
- name: Run security audit
run: npm audit --audit-level moderate
`
Jenkins CI Pipeline
`groovy
pipeline {
agent any
environment {
NODE_VERSION = '16'
DOCKER_REGISTRY = 'your-registry.com'
}
stages {
stage('Checkout') {
steps {
checkout scm
}
}
stage('Setup') {
steps {
sh 'nvm use ${NODE_VERSION}'
sh 'npm ci'
}
}
stage('Code Quality') {
parallel {
stage('Lint') {
steps {
sh 'npm run lint'
}
}
stage('Security Scan') {
steps {
sh 'npm audit --audit-level moderate'
}
}
}
}
stage('Test') {
steps {
sh 'npm test'
}
post {
always {
publishTestResults testResultsPattern: 'test-results.xml'
publishCoverageGoberturaReports 'coverage/cobertura-coverage.xml'
}
}
}
stage('Build') {
steps {
sh 'npm run build'
archiveArtifacts artifacts: 'dist//*', allowEmptyArchive: false
}
}
stage('Docker Build') {
when {
branch 'main'
}
steps {
script {
def image = docker.build("${DOCKER_REGISTRY}/myapp:${BUILD_NUMBER}")
docker.withRegistry('https://your-registry.com', 'registry-credentials') {
image.push()
image.push('latest')
}
}
}
}
}
post {
failure {
emailext (
subject: "Build Failed: ${env.JOB_NAME} - ${env.BUILD_NUMBER}",
body: "Build failed. Check console output at ${env.BUILD_URL}",
to: "${env.CHANGE_AUTHOR_EMAIL}"
)
}
}
}
`
Continuous Delivery (CD) Implementation {#continuous-delivery}
Understanding Continuous Delivery
Continuous Delivery extends CI by ensuring that code changes are automatically prepared for release to production. The key difference from Continuous Deployment is that releases to production are triggered manually, providing control over when features reach end users.
CD Pipeline Architecture
Environment Strategy
`
Development → Testing → Staging → Production
↓ ↓ ↓ ↓
Unit Tests Integration System Manual
Tests Tests Approval
`
Deployment Strategies
1. Blue-Green Deployment
`yaml
Blue-Green deployment with Kubernetes
apiVersion: v1 kind: Service metadata: name: myapp-service spec: selector: app: myapp version: blue # Switch to 'green' for deployment ports: - port: 80 targetPort: 8080 --- apiVersion: apps/v1 kind: Deployment metadata: name: myapp-blue spec: replicas: 3 selector: matchLabels: app: myapp version: blue template: metadata: labels: app: myapp version: blue spec: containers: - name: myapp image: myapp:v1.0.0 ports: - containerPort: 8080`2. Canary Deployment
`yaml
Canary deployment configuration
apiVersion: argoproj.io/v1alpha1 kind: Rollout metadata: name: myapp-rollout spec: replicas: 10 strategy: canary: steps: - setWeight: 10 - pause: {duration: 10m} - setWeight: 50 - pause: {duration: 10m} - setWeight: 100 selector: matchLabels: app: myapp template: metadata: labels: app: myapp spec: containers: - name: myapp image: myapp:latest`GitOps Workflow
Step 1: Repository Structure
`
infrastructure/
├── environments/
│ ├── dev/
│ ├── staging/
│ └── prod/
├── applications/
│ ├── frontend/
│ ├── backend/
│ └── database/
└── shared/
├── monitoring/
└── networking/
`
Step 2: ArgoCD Application Configuration
`yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp-production
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/company/k8s-configs
targetRevision: HEAD
path: applications/myapp/prod
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
`
Automation Strategies {#automation}
Infrastructure Automation
Terraform Example for AWS
`hcl
main.tf
provider "aws" { region = var.aws_region }module "vpc" { source = "terraform-aws-modules/vpc/aws" name = "${var.project_name}-vpc" cidr = "10.0.0.0/16" azs = ["${var.aws_region}a", "${var.aws_region}b"] private_subnets = ["10.0.1.0/24", "10.0.2.0/24"] public_subnets = ["10.0.101.0/24", "10.0.102.0/24"] enable_nat_gateway = true enable_vpn_gateway = false tags = { Environment = var.environment Project = var.project_name } }
module "eks" {
source = "terraform-aws-modules/eks/aws"
cluster_name = "${var.project_name}-cluster"
cluster_version = "1.21"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
node_groups = {
main = {
desired_capacity = 2
max_capacity = 4
min_capacity = 1
instance_types = ["t3.medium"]
k8s_labels = {
Environment = var.environment
}
}
}
}
`
Testing Automation
Automated Testing Pipeline
`python
test_automation.py
import pytest import requests from selenium import webdriver from selenium.webdriver.common.by import Byclass TestAPI: def setup_method(self): self.base_url = "https://api.example.com" def test_health_check(self): response = requests.get(f"{self.base_url}/health") assert response.status_code == 200 assert response.json()["status"] == "healthy" def test_user_creation(self): user_data = { "name": "Test User", "email": "test@example.com" } response = requests.post(f"{self.base_url}/users", json=user_data) assert response.status_code == 201 assert response.json()["email"] == user_data["email"]
class TestUI:
def setup_method(self):
self.driver = webdriver.Chrome()
self.driver.get("https://app.example.com")
def teardown_method(self):
self.driver.quit()
def test_login_flow(self):
# Login test
self.driver.find_element(By.ID, "email").send_keys("user@example.com")
self.driver.find_element(By.ID, "password").send_keys("password123")
self.driver.find_element(By.ID, "login-button").click()
# Verify successful login
assert "dashboard" in self.driver.current_url
assert self.driver.find_element(By.CLASS_NAME, "user-menu").is_displayed()
`
Security Automation
Security Scanning Integration
`yaml
security-scan.yml
name: Security Scan on: push: branches: [ main ] schedule: - cron: '0 2 *' # Daily at 2 AMjobs:
security-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
scan-type: 'fs'
scan-ref: '.'
format: 'sarif'
output: 'trivy-results.sarif'
- name: Upload Trivy scan results
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: 'trivy-results.sarif'
- name: OWASP ZAP Baseline Scan
uses: zaproxy/action-baseline@v0.7.0
with:
target: 'https://staging.example.com'
rules_file_name: '.zap/rules.tsv'
cmd_options: '-a'
`
Infrastructure as Code (IaC) {#infrastructure-as-code}
Principles of Infrastructure as Code
Infrastructure as Code treats infrastructure configuration as software code, enabling version control, testing, and automated deployment of infrastructure components.
Key Benefits: - Consistency: Identical environments across development, staging, and production - Version Control: Track changes and roll back when necessary - Automation: Reduce manual configuration errors - Documentation: Infrastructure becomes self-documenting - Cost Management: Easier to spin up/down environments
Terraform Best Practices
Project Structure
`
terraform/
├── modules/
│ ├── networking/
│ ├── compute/
│ └── database/
├── environments/
│ ├── dev/
│ ├── staging/
│ └── prod/
├── shared/
│ └── remote-state/
└── scripts/
└── deploy.sh
`
Module Example: Networking
`hcl
modules/networking/main.tf
variable "environment" { description = "Environment name" type = string }variable "cidr_block" { description = "CIDR block for VPC" type = string default = "10.0.0.0/16" }
resource "aws_vpc" "main" { cidr_block = var.cidr_block enable_dns_hostnames = true enable_dns_support = true tags = { Name = "${var.environment}-vpc" Environment = var.environment } }
resource "aws_internet_gateway" "main" { vpc_id = aws_vpc.main.id tags = { Name = "${var.environment}-igw" Environment = var.environment } }
resource "aws_subnet" "public" { count = length(data.aws_availability_zones.available.names) vpc_id = aws_vpc.main.id cidr_block = cidrsubnet(var.cidr_block, 8, count.index) availability_zone = data.aws_availability_zones.available.names[count.index] map_public_ip_on_launch = true tags = { Name = "${var.environment}-public-${count.index + 1}" Environment = var.environment Type = "public" } }
data "aws_availability_zones" "available" { state = "available" }
outputs.tf
output "vpc_id" { value = aws_vpc.main.id }output "public_subnet_ids" {
value = aws_subnet.public[*].id
}
`
Kubernetes Manifests with Kustomize
Base Configuration
`yaml
base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomizationresources: - deployment.yaml - service.yaml - configmap.yaml
commonLabels: app: myapp
base/deployment.yaml
apiVersion: apps/v1 kind: Deployment metadata: name: myapp spec: replicas: 1 selector: matchLabels: app: myapp template: metadata: labels: app: myapp spec: containers: - name: myapp image: myapp:latest ports: - containerPort: 8080 env: - name: DATABASE_URL valueFrom: configMapKeyRef: name: myapp-config key: database-url resources: requests: memory: "64Mi" cpu: "250m" limits: memory: "128Mi" cpu: "500m"`Environment Overlays
`yaml
overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomizationbases: - ../../base
patchesStrategicMerge: - deployment-patch.yaml
replicas: - name: myapp count: 3
images: - name: myapp newTag: v1.2.3
overlays/production/deployment-patch.yaml
apiVersion: apps/v1 kind: Deployment metadata: name: myapp spec: template: spec: containers: - name: myapp resources: requests: memory: "256Mi" cpu: "500m" limits: memory: "512Mi" cpu: "1000m"`Helm Charts for Complex Applications
`yaml
Chart.yaml
apiVersion: v2 name: myapp description: A Helm chart for MyApp version: 0.1.0 appVersion: "1.0"values.yaml
replicaCount: 1image: repository: myapp pullPolicy: IfNotPresent tag: "latest"
service: type: ClusterIP port: 80
ingress: enabled: false className: "" annotations: {} hosts: - host: chart-example.local paths: - path: / pathType: Prefix tls: []
resources: limits: cpu: 500m memory: 512Mi requests: cpu: 250m memory: 256Mi
autoscaling: enabled: false minReplicas: 1 maxReplicas: 100 targetCPUUtilizationPercentage: 80
templates/deployment.yaml
apiVersion: apps/v1 kind: Deployment metadata: name: # labels: # spec: # replicas: # # selector: matchLabels: # template: metadata: labels: # spec: containers: - name: # image: "#:#" imagePullPolicy: # ports: - name: http containerPort: 8080 protocol: TCP resources: #`Monitoring and Observability {#monitoring}
The Three Pillars of Observability
1. Metrics: Quantitative measurements of system behavior 2. Logs: Discrete events that occurred in the system 3. Traces: Request flow through distributed systems
Prometheus and Grafana Setup
Prometheus Configuration
`yaml
prometheus.yml
global: scrape_interval: 15s evaluation_interval: 15srule_files: - "alert_rules.yml"
alerting: alertmanagers: - static_configs: - targets: - alertmanager:9093
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'myapp'
static_configs:
- targets: ['myapp:8080']
metrics_path: '/metrics'
scrape_interval: 10s
`
Application Metrics in Node.js
`javascript
// metrics.js
const promClient = require('prom-client');
// Create a Registry to register the metrics const register = new promClient.Registry();
// Add default metrics promClient.collectDefaultMetrics({ app: 'myapp', timeout: 10000, gcDurationBuckets: [0.001, 0.01, 0.1, 1, 2, 5], register });
// Custom metrics const httpRequestDuration = new promClient.Histogram({ name: 'http_request_duration_seconds', help: 'Duration of HTTP requests in seconds', labelNames: ['method', 'route', 'status_code'], buckets: [0.1, 0.3, 0.5, 0.7, 1, 3, 5, 7, 10] });
const httpRequestTotal = new promClient.Counter({ name: 'http_requests_total', help: 'Total number of HTTP requests', labelNames: ['method', 'route', 'status_code'] });
const activeConnections = new promClient.Gauge({ name: 'active_connections', help: 'Number of active connections' });
register.registerMetric(httpRequestDuration); register.registerMetric(httpRequestTotal); register.registerMetric(activeConnections);
// Middleware to collect metrics function metricsMiddleware(req, res, next) { const start = Date.now(); res.on('finish', () => { const duration = (Date.now() - start) / 1000; const route = req.route ? req.route.path : req.path; httpRequestDuration .labels(req.method, route, res.statusCode) .observe(duration); httpRequestTotal .labels(req.method, route, res.statusCode) .inc(); }); next(); }
module.exports = {
register,
metricsMiddleware,
activeConnections
};
`
Structured Logging
Winston Configuration
`javascript
// logger.js
const winston = require('winston');
const logger = winston.createLogger({ level: process.env.LOG_LEVEL || 'info', format: winston.format.combine( winston.format.timestamp(), winston.format.errors({ stack: true }), winston.format.json() ), defaultMeta: { service: 'myapp', version: process.env.APP_VERSION || '1.0.0' }, transports: [ new winston.transports.File({ filename: 'error.log', level: 'error' }), new winston.transports.File({ filename: 'combined.log' }), new winston.transports.Console({ format: winston.format.combine( winston.format.colorize(), winston.format.simple() ) }) ] });
// Request logging middleware
function requestLogger(req, res, next) {
const start = Date.now();
res.on('finish', () => {
const duration = Date.now() - start;
logger.info('HTTP Request', {
method: req.method,
url: req.url,
statusCode: res.statusCode,
duration: ${duration}ms,
userAgent: req.get('User-Agent'),
ip: req.ip,
requestId: req.headers['x-request-id']
});
});
next();
}
module.exports = { logger, requestLogger };
`
Distributed Tracing with Jaeger
`javascript
// tracing.js
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { JaegerExporter } = require('@opentelemetry/exporter-jaeger');
const jaegerExporter = new JaegerExporter({ endpoint: process.env.JAEGER_ENDPOINT || 'http://localhost:14268/api/traces', });
const sdk = new NodeSDK({ traceExporter: jaegerExporter, instrumentations: [getNodeAutoInstrumentations()], serviceName: 'myapp', serviceVersion: process.env.APP_VERSION || '1.0.0' });
sdk.start();
module.exports = sdk;
`
Alert Rules
`yaml
alert_rules.yml
groups: - name: myapp_alerts rules: - alert: HighErrorRate expr: rate(http_requests_total{status_code=~"5.."}[5m]) > 0.1 for: 5m labels: severity: critical annotations: summary: "High error rate detected" description: "Error rate is # errors per second" - alert: HighLatency expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 0.5 for: 5m labels: severity: warning annotations: summary: "High latency detected" description: "95th percentile latency is # seconds" - alert: ServiceDown expr: up == 0 for: 1m labels: severity: critical annotations: summary: "Service is down" description: "# has been down for more than 1 minute"`DevOps Culture and Team Dynamics {#culture}
Building a DevOps Culture
1. Shared Responsibility - Development teams own their code in production - Operations teams become platform enablers - Quality is everyone's responsibility
2. Continuous Learning - Regular post-mortems without blame - Knowledge sharing sessions - Cross-training between teams
3. Automation First - Automate repetitive tasks - Reduce toil and manual interventions - Focus human effort on high-value activities
Team Structure Models
1. Cross-Functional Teams
`
Product Team
├── Product Owner
├── Frontend Developers
├── Backend Developers
├── DevOps Engineer
├── QA Engineer
└── UX Designer
`
2. Platform Team Model
`
Platform Team Product Teams
├── Infrastructure ├── Team A
├── CI/CD Pipelines ├── Team B
├── Monitoring └── Team C
├── Security
└── Developer Tools
`
Communication and Collaboration Tools
ChatOps Implementation
`javascript
// slack-bot.js
const { App } = require('@slack/bolt');
const app = new App({ token: process.env.SLACK_BOT_TOKEN, signingSecret: process.env.SLACK_SIGNING_SECRET });
// Deploy command
app.command('/deploy', async ({ command, ack, respond }) => {
await ack();
const [environment, version] = command.text.split(' ');
if (!environment || !version) {
await respond('Usage: /deploy Deploying version ${version} to ${environment}...);
try {
// Trigger deployment pipeline
const result = await triggerDeployment(environment, version);
await respond(✅ Deployment successful: ${result.deploymentUrl});
} catch (error) {
await respond(❌ Deployment failed: ${error.message});
}
});
// Status command
app.command('/status', async ({ command, ack, respond }) => {
await ack();
const services = await getServiceStatus();
const statusMessage = services.map(service =>
${service.name}: ${service.status === 'healthy' ? '✅' : '❌'} ${service.status}
).join('\n');
await respond(Service Status:\n${statusMessage});
});
async function triggerDeployment(environment, version) {
// Integration with CI/CD pipeline
const response = await fetch(${process.env.JENKINS_URL}/job/deploy/buildWithParameters, {
method: 'POST',
headers: {
'Authorization': Bearer ${process.env.JENKINS_TOKEN},
'Content-Type': 'application/x-www-form-urlencoded'
},
body: environment=${environment}&version=${version}
});
return response.json();
}
`
Incident Response Process
1. Incident Detection - Automated alerting - Monitoring dashboards - User reports
2. Response Workflow
`mermaid
graph TD
A[Incident Detected] --> B[Create Incident Channel]
B --> C[Assign Incident Commander]
C --> D[Assess Severity]
D --> E[Form Response Team]
E --> F[Implement Fix]
F --> G[Monitor Resolution]
G --> H[Post-Mortem]
`
3. Post-Mortem Template
`markdown
Post-Mortem: [Incident Title]
Summary
Brief description of the incident and its impact.Timeline
- Detection Time: When the incident was first detected - Response Time: When the response team was assembled - Resolution Time: When the incident was resolved - Duration: Total incident durationRoot Cause Analysis
What caused the incident and why it wasn't caught earlier.Impact
- Users affected - Services impacted - Revenue impact (if applicable)Action Items
- [ ] Immediate fixes (Owner, Due Date) - [ ] Long-term improvements (Owner, Due Date) - [ ] Process improvements (Owner, Due Date)Lessons Learned
What we learned and how we can prevent similar incidents.`Tool Implementation Examples {#tool-examples}
Complete CI/CD Pipeline with Multiple Tools
GitHub Actions + Docker + Kubernetes
`yaml
name: Complete CI/CD Pipeline
on: push: branches: [ main, develop ] pull_request: branches: [ main ]
env: REGISTRY: ghcr.io IMAGE_NAME: $#
jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Setup Node.js uses: actions/setup-node@v3 with: node-version: '16' cache: 'npm' - name: Install dependencies run: npm ci - name: Run tests run: npm test - name: SonarCloud Scan uses: SonarSource/sonarcloud-github-action@master env: GITHUB_TOKEN: $# SONAR_TOKEN: $#
build: needs: test runs-on: ubuntu-latest if: github.ref == 'refs/heads/main' steps: - uses: actions/checkout@v3 - name: Log in to Container Registry uses: docker/login-action@v2 with: registry: $# username: $# password: $# - name: Extract metadata id: meta uses: docker/metadata-action@v4 with: images: $#/$# tags: | type=ref,event=branch type=ref,event=pr type=sha,prefix=sha- - name: Build and push Docker image uses: docker/build-push-action@v4 with: context: . push: true tags: $# labels: $#
deploy-staging: needs: build runs-on: ubuntu-latest environment: staging steps: - uses: actions/checkout@v3 - name: Setup kubectl uses: azure/setup-kubectl@v3 with: version: 'v1.24.0' - name: Configure kubectl run: | echo "$#" | base64 -d > kubeconfig export KUBECONFIG=kubeconfig - name: Deploy to staging run: | export KUBECONFIG=kubeconfig envsubst < k8s/staging/deployment.yaml | kubectl apply -f - kubectl rollout status deployment/myapp -n staging env: IMAGE_TAG: sha-$#
deploy-production:
needs: deploy-staging
runs-on: ubuntu-latest
environment: production
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v3
- name: Setup kubectl
uses: azure/setup-kubectl@v3
with:
version: 'v1.24.0'
- name: Configure kubectl
run: |
echo "$#" | base64 -d > kubeconfig
export KUBECONFIG=kubeconfig
- name: Deploy to production
run: |
export KUBECONFIG=kubeconfig
envsubst < k8s/production/deployment.yaml | kubectl apply -f -
kubectl rollout status deployment/myapp -n production
env:
IMAGE_TAG: sha-$#
- name: Run smoke tests
run: |
npm run test:smoke -- --url https://api.production.com
- name: Notify Slack
uses: 8398a7/action-slack@v3
with:
status: $#
channel: '#deployments'
webhook_url: $#
`
Jenkins Pipeline with Shared Libraries
Shared Library Structure
`
jenkins-shared-library/
├── vars/
│ ├── deployToK8s.groovy
│ ├── buildDockerImage.groovy
│ └── runTests.groovy
└── src/
└── com/
└── company/
└── pipeline/
├── Docker.groovy
└── Kubernetes.groovy
`
Shared Library Implementation
`groovy
// vars/buildDockerImage.groovy
def call(Map config) {
def imageName = config.imageName
def tag = config.tag ?: env.BUILD_NUMBER
def dockerfile = config.dockerfile ?: 'Dockerfile'
def context = config.context ?: '.'
script {
def image = docker.build("${imageName}:${tag}", "-f ${dockerfile} ${context}")
if (config.registry) {
docker.withRegistry(config.registry.url, config.registry.credentialsId) {
image.push()
if (config.pushLatest) {
image.push('latest')
}
}
}
return image
}
}
// vars/deployToK8s.groovy def call(Map config) { def namespace = config.namespace def deployment = config.deployment def image = config.image def kubeconfig = config.kubeconfig withCredentials([file(credentialsId: kubeconfig, variable: 'KUBECONFIG')]) { sh """ kubectl set image deployment/${deployment} ${deployment}=${image} -n ${namespace} kubectl rollout status deployment/${deployment} -n ${namespace} --timeout=300s """ } }
// Jenkinsfile using shared libraries @Library('jenkins-shared-library') _
pipeline {
agent any
environment {
DOCKER_REGISTRY = 'your-registry.com'
IMAGE_NAME = "${DOCKER_REGISTRY}/myapp"
}
stages {
stage('Test') {
steps {
runTests([
testCommand: 'npm test',
coverageThreshold: 80
])
}
}
stage('Build') {
steps {
script {
buildDockerImage([
imageName: env.IMAGE_NAME,
tag: env.BUILD_NUMBER,
registry: [
url: "https://${DOCKER_REGISTRY}",
credentialsId: 'docker-registry-creds'
],
pushLatest: env.BRANCH_NAME == 'main'
])
}
}
}
stage('Deploy to Staging') {
when {
branch 'main'
}
steps {
deployToK8s([
namespace: 'staging',
deployment: 'myapp',
image: "${env.IMAGE_NAME}:${env.BUILD_NUMBER}",
kubeconfig: 'k8s-staging-config'
])
}
}
stage('Deploy to Production') {
when {
branch 'main'
}
input {
message "Deploy to production?"
ok "Deploy"
parameters {
choice(name: 'DEPLOYMENT_TYPE', choices: ['rolling', 'blue-green'], description: 'Deployment strategy')
}
}
steps {
deployToK8s([
namespace: 'production',
deployment: 'myapp',
image: "${env.IMAGE_NAME}:${env.BUILD_NUMBER}",
kubeconfig: 'k8s-production-config'
])
}
}
}
}
`
Advanced Workflows {#advanced-workflows}
Multi-Cloud Deployment Strategy
Terraform Multi-Cloud Configuration
`hcl
providers.tf
terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } azurerm = { source = "hashicorp/azurerm" version = "~> 3.0" } google = { source = "hashicorp/google" version = "~> 4.0" } } }provider "aws" { region = var.aws_region }
provider "azurerm" { features {} }
provider "google" { project = var.gcp_project region = var.gcp_region }
main.tf
module "aws_infrastructure" { source = "./modules/aws" environment = var.environment vpc_cidr = "10.0.0.0/16" }module "azure_infrastructure" { source = "./modules/azure" environment = var.environment location = "East US" address_space = ["10.1.0.0/16"] }
module "gcp_infrastructure" {
source = "./modules/gcp"
environment = var.environment
region = var.gcp_region
cidr_range = "10.2.0.0/16"
}
`
Feature Flag Integration
`javascript
// feature-flags.js
const LaunchDarkly = require('launchdarkly-node-server-sdk');
class FeatureFlags { constructor() { this.client = LaunchDarkly.init(process.env.LAUNCHDARKLY_SDK_KEY); } async isEnabled(flagKey, user, defaultValue = false) { try { await this.client.waitForInitialization(); return await this.client.variation(flagKey, user, defaultValue); } catch (error) { console.error('Feature flag error:', error); return defaultValue; } } async getVariation(flagKey, user, defaultValue) { try { await this.client.waitForInitialization(); return await this.client.variation(flagKey, user, defaultValue); } catch (error) { console.error('Feature flag error:', error); return defaultValue; } } }
// Usage in application const featureFlags = new FeatureFlags();
app.get('/api/users', async (req, res) => {
const user = {
key: req.user.id,
email: req.user.email,
custom: {
plan: req.user.plan
}
};
const useNewUserAPI = await featureFlags.isEnabled('new-user-api', user);
if (useNewUserAPI) {
return res.json(await getUsersV2());
} else {
return res.json(await getUsersV1());
}
});
`
Chaos Engineering
`yaml
chaos-experiment.yaml
apiVersion: litmuschaos.io/v1alpha1 kind: ChaosEngine metadata: name: nginx-chaos namespace: default spec: engineState: 'active' appinfo: appns: 'default' applabel: 'app=nginx' appkind: 'deployment' chaosServiceAccount: litmus-admin experiments: - name: pod-delete spec: components: env: - name: TOTAL_CHAOS_DURATION value: '30' - name: CHAOS_INTERVAL value: '10' - name: FORCE value: 'false'`Progressive Delivery with Flagger
`yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: myapp
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
progressDeadlineSeconds: 60
service:
port: 80
targetPort: 8080
gateways:
- myapp-gateway
hosts:
- app.example.com
analysis:
interval: 1m
threshold: 5
maxWeight: 50
stepWeight: 10
metrics:
- name: request-success-rate
thresholdRange:
min: 99
interval: 1m
- name: request-duration
thresholdRange:
max: 500
interval: 30s
webhooks:
- name: acceptance-test
type: pre-rollout
url: http://flagger-loadtester.test/
timeout: 30s
metadata:
type: bash
cmd: "curl -sd 'test' http://myapp-canary/token | grep token"
- name: load-test
url: http://flagger-loadtester.test/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://myapp-canary/"
`
Conclusion {#conclusion}
DevOps is more than just a set of tools and practices—it's a cultural transformation that enables organizations to deliver software faster, more reliably, and with higher quality. The key to successful DevOps implementation lies in:
Key Takeaways
1. Start Small and Iterate: Begin with basic CI/CD pipelines and gradually add more sophisticated practices 2. Automate Everything: From testing to deployment to infrastructure provisioning 3. Measure and Monitor: Use metrics to drive decisions and continuous improvement 4. Foster Collaboration: Break down silos between development and operations teams 5. Embrace Failure: Learn from failures and build resilience into your systems
Implementation Roadmap
Phase 1: Foundation (Months 1-3) - Set up version control and basic CI pipelines - Implement automated testing - Establish monitoring and logging
Phase 2: Automation (Months 4-6) - Infrastructure as Code implementation - Automated deployments to staging - Security scanning integration
Phase 3: Advanced Practices (Months 7-12) - Production deployments with advanced strategies - Comprehensive monitoring and alerting - Chaos engineering and resilience testing
Phase 4: Optimization (Ongoing) - Performance optimization - Cost management - Advanced deployment patterns
Best Practices Summary
- Version Control Everything: Code, infrastructure, configurations, and documentation - Test Early and Often: Unit tests, integration tests, security scans - Deploy Frequently: Small, incremental changes reduce risk - Monitor Continuously: Metrics, logs, and traces provide visibility - Automate Toil: Focus human effort on high-value activities - Learn from Incidents: Post-mortems without blame improve resilience
The journey to DevOps excellence is continuous. Technology evolves, practices improve, and organizational needs change. The most successful DevOps implementations are those that remain adaptable and committed to continuous learning and improvement.
Remember that DevOps is ultimately about enabling your organization to deliver value to customers more effectively. Keep this goal in mind as you implement these practices, and don't hesitate to adapt them to your specific context and requirements.
By following the practices and examples outlined in this guide, you'll be well-equipped to build a robust DevOps culture and technical foundation that supports your organization's goals for software delivery and operational excellence.