What is Integrating Email Alerts with System Monitoring Guide about?

Learn to set up email alerts for system monitoring using SMTP, Postfix, and monitoring tools like Nagios and Zabbix for immediate incident response.

Who should read this article?

This article is perfect for technology professionals, developers, and anyone interested in system administration looking to enhance their skills and knowledge.

How long does it take to read?

This article takes approximately 18 minutes to read and contains 3563 words of expert insights and practical information.

What topics are covered?

This article covers key topics including: DevOps, Postfix, SMTP, System Monitoring, email alerts, providing comprehensive insights for technology professionals.

Integrating Email Alerts with System...

Integrating Email Alerts with System Monitoring

1. [Introduction](#introduction) 2. [Prerequisites](#prerequisites) 3. [Email Configuration Methods](#email-configuration-methods) 4. [Monitoring Tools Integration](#monitoring-tools-integration) 5. [Alert Configuration](#alert-configuration) 6. [Script-Based Monitoring](#script-based-monitoring) 7. [Advanced Configuration](#advanced-configuration) 8. [Troubleshooting](#troubleshooting) 9. [Best Practices](#best-practices)

Introduction

Email alerts in system monitoring provide immediate notification when critical system events occur, performance thresholds are exceeded, or services become unavailable. This integration ensures administrators can respond quickly to issues before they impact users or cause system downtime.

System monitoring with email alerts typically involves: - Monitoring system resources (CPU, memory, disk space, network) - Service availability monitoring - Log file analysis for errors - Performance threshold monitoring - Security event detection

Prerequisites

Before implementing email alerts with system monitoring, ensure you have:

| Requirement | Description | Example | |-------------|-------------|---------| | SMTP Server | Mail server for sending emails | Gmail SMTP, company mail server | | Monitoring Tools | System monitoring software | Nagios, Zabbix, custom scripts | | System Access | Administrative privileges | root or sudo access | | Email Credentials | Valid email account for sending | monitoring@company.com | | Network Access | Connectivity to SMTP server | Port 587 or 465 open |

Email Configuration Methods

Method 1: Using Postfix (Local Mail Server)

Install and configure Postfix as a local mail relay:

`bash

Install Postfix

sudo apt-get update sudo apt-get install postfix mailutils

Configure Postfix for Gmail relay

sudo nano /etc/postfix/main.cf `

Add the following configuration to /etc/postfix/main.cf:

` relayhost = [smtp.gmail.com]:587 smtp_sasl_auth_enable = yes smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd smtp_sasl_security_options = noanonymous smtp_tls_CAfile = /etc/ssl/certs/ca-certificates.crt smtp_use_tls = yes `

Create SASL password file:

`bash

Create password file

sudo nano /etc/postfix/sasl_passwd

Add credentials (replace with actual values)

[smtp.gmail.com]:587 username@gmail.com:app_password

Secure and hash the file

sudo chmod 400 /etc/postfix/sasl_passwd sudo postmap /etc/postfix/sasl_passwd

Restart Postfix

sudo systemctl restart postfix `

Method 2: Using SSMTP (Lightweight Alternative)

`bash

Install SSMTP

sudo apt-get install ssmtp

Configure SSMTP

sudo nano /etc/ssmtp/ssmtp.conf `

SSMTP configuration:

` root=monitoring@yourdomain.com mailhub=smtp.gmail.com:587 rewriteDomain=yourdomain.com AuthUser=your-email@gmail.com AuthPass=your-app-password FromLineOverride=YES UseSTARTTLS=YES `

Method 3: Using Python SMTP Library

Create a Python script for sending emails:

`python #!/usr/bin/env python3 import smtplib import sys from email.mime.text import MIMEText from email.mime.multipart import MIMEMultipart from datetime import datetime

def send_alert_email(subject, message, recipient): # Email configuration smtp_server = "smtp.gmail.com" smtp_port = 587 sender_email = "monitoring@yourdomain.com" sender_password = "your-app-password" # Create message msg = MIMEMultipart() msg['From'] = sender_email msg['To'] = recipient msg['Subject'] = f"[ALERT] {subject}" # Email body body = f""" System Alert Notification Time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')} Alert: {subject} Details: {message} Please investigate immediately. -- System Monitoring """ msg.attach(MIMEText(body, 'plain')) try: # Connect to server and send email server = smtplib.SMTP(smtp_server, smtp_port) server.starttls() server.login(sender_email, sender_password) text = msg.as_string() server.sendmail(sender_email, recipient, text) server.quit() print("Alert email sent successfully") return True except Exception as e: print(f"Failed to send email: {e}") return False

if __name__ == "__main__": if len(sys.argv) != 4: print("Usage: python3 email_alert.py ") sys.exit(1) subject = sys.argv[1] message = sys.argv[2] recipient = sys.argv[3] send_alert_email(subject, message, recipient) `

Monitoring Tools Integration

Nagios Integration

Nagios is a comprehensive monitoring solution that supports email notifications natively.

#### Command Configuration

Define notification commands in /etc/nagios3/conf.d/commands.cfg:

` define command{ command_name notify-host-by-email command_line /usr/bin/printf "%b" " Nagios \n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/mail -s " $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ " $CONTACTEMAIL$ }

define command{ command_name notify-service-by-email command_line /usr/bin/printf "%b" " Nagios \n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /usr/bin/mail -s " $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ " $CONTACTEMAIL$ } `

#### Contact Configuration

Define contacts in /etc/nagios3/conf.d/contacts.cfg:

` define contact{ contact_name admin use generic-contact alias System Administrator email admin@yourdomain.com host_notification_period 24x7 service_notification_period 24x7 host_notification_options d,u,r,f,s service_notification_options w,u,c,r,f,s host_notification_commands notify-host-by-email service_notification_commands notify-service-by-email }

define contactgroup{ contactgroup_name admins alias System Administrators members admin } `

Zabbix Integration

Zabbix provides robust email notification capabilities through media types and actions.

#### Media Type Configuration

Configure email media type in Zabbix web interface or via configuration:

| Parameter | Value | Description | |-----------|-------|-------------| | Name | Email | Media type name | | Type | Email | Communication method | | SMTP server | smtp.gmail.com | Mail server address | | SMTP server port | 587 | SMTP port | | SMTP helo | yourdomain.com | HELO message | | SMTP email | monitoring@yourdomain.com | Sender email | | Connection security | STARTTLS | Security method | | Authentication | Username and password | Auth method | | Username | your-email@gmail.com | SMTP username | | Password | your-app-password | SMTP password |

#### Action Configuration

Create actions to trigger email notifications:

`sql -- Example Zabbix action configuration INSERT INTO actions (actionid, name, eventsource, evaltype, status, esc_period, def_shortdata, def_longdata) VALUES (1, 'Email Notifications', 0, 0, 0, 3600, 'Problem: {EVENT.NAME}', 'Problem started at {EVENT.TIME} on {EVENT.DATE}\nProblem name: {EVENT.NAME}\nHost: {HOST.NAME}\nSeverity: {EVENT.SEVERITY}\n\nOriginal problem ID: {EVENT.ID}\n{TRIGGER.URL}'); `

Custom Script Integration

Create comprehensive monitoring scripts with email integration:

`bash #!/bin/bash

System monitoring script with email alerts

File: /usr/local/bin/system_monitor.sh

Configuration

EMAIL_RECIPIENT="admin@yourdomain.com" HOSTNAME=$(hostname) ALERT_SCRIPT="/usr/local/bin/send_alert.py"

Thresholds

CPU_THRESHOLD=80 MEMORY_THRESHOLD=85 DISK_THRESHOLD=90 LOAD_THRESHOLD=5.0

Function to send alert

send_alert() { local subject="$1" local message="$2" python3 "$ALERT_SCRIPT" "$subject" "$message" "$EMAIL_RECIPIENT" }

Check CPU usage

check_cpu() { cpu_usage=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | awk -F'%' '{print $1}') cpu_usage_int=${cpu_usage%.*} if [ "$cpu_usage_int" -gt "$CPU_THRESHOLD" ]; then send_alert "High CPU Usage on $HOSTNAME" "CPU usage is ${cpu_usage}%, threshold is ${CPU_THRESHOLD}%" fi }

Check memory usage

check_memory() { memory_usage=$(free | grep Mem | awk '{printf "%.0f", $3/$2 * 100.0}') if [ "$memory_usage" -gt "$MEMORY_THRESHOLD" ]; then send_alert "High Memory Usage on $HOSTNAME" "Memory usage is ${memory_usage}%, threshold is ${MEMORY_THRESHOLD}%" fi }

Check disk usage

check_disk() { while read output; do usage=$(echo $output | awk '{print $5}' | cut -d'%' -f1) partition=$(echo $output | awk '{print $6}') if [ $usage -ge $DISK_THRESHOLD ]; then send_alert "High Disk Usage on $HOSTNAME" "Disk usage on $partition is ${usage}%, threshold is ${DISK_THRESHOLD}%" fi done <<< "$(df -h | grep -vE '^Filesystem|tmpfs|cdrom')" }

Check system load

check_load() { load_avg=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $1}' | sed 's/,//') if (( $(echo "$load_avg > $LOAD_THRESHOLD" | bc -l) )); then send_alert "High System Load on $HOSTNAME" "System load average is $load_avg, threshold is $LOAD_THRESHOLD" fi }

Check service status

check_services() { services=("ssh" "apache2" "mysql" "nginx") for service in "${services[@]}"; do if ! systemctl is-active --quiet "$service"; then send_alert "Service Down on $HOSTNAME" "Service $service is not running" fi done }

Main execution

main() { echo "$(date): Starting system monitoring check" check_cpu check_memory check_disk check_load check_services echo "$(date): System monitoring check completed" }

main "$@" `

Alert Configuration

Alert Severity Levels

Define different alert levels with appropriate email formatting:

| Severity | Description | Email Subject Prefix | Response Time | |----------|-------------|---------------------|---------------| | Critical | System down, data loss risk | [CRITICAL] | Immediate | | High | Service unavailable | [HIGH] | 15 minutes | | Medium | Performance degraded | [MEDIUM] | 1 hour | | Low | Minor issues | [LOW] | 4 hours | | Info | Informational | [INFO] | No action required |

Advanced Alert Script

`python #!/usr/bin/env python3

File: /usr/local/bin/advanced_alert.py

import smtplib import json import sys import os from email.mime.text import MIMEText from email.mime.multipart import MIMEMultipart from datetime import datetime import logging

class AlertManager: def __init__(self, config_file="/etc/monitoring/alert_config.json"): self.config = self.load_config(config_file) self.setup_logging() def load_config(self, config_file): """Load configuration from JSON file""" try: with open(config_file, 'r') as f: return json.load(f) except Exception as e: # Default configuration return { "smtp": { "server": "smtp.gmail.com", "port": 587, "username": "monitoring@yourdomain.com", "password": "your-app-password", "use_tls": True }, "recipients": { "critical": ["admin@yourdomain.com", "oncall@yourdomain.com"], "high": ["admin@yourdomain.com"], "medium": ["admin@yourdomain.com"], "low": ["admin@yourdomain.com"], "info": ["logs@yourdomain.com"] }, "rate_limiting": { "enabled": True, "max_emails_per_hour": 10 } } def setup_logging(self): """Setup logging configuration""" logging.basicConfig( level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', handlers=[ logging.FileHandler('/var/log/monitoring_alerts.log'), logging.StreamHandler() ] ) self.logger = logging.getLogger(__name__) def check_rate_limit(self): """Check if rate limiting allows sending email""" if not self.config.get("rate_limiting", {}).get("enabled", False): return True # Implementation would check recent email count # For brevity, always return True return True def format_email_body(self, severity, subject, message, additional_info=None): """Format email body based on severity""" timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S') hostname = os.uname().nodename body = f""" System Alert Notification

Severity: {severity.upper()} Time: {timestamp} Host: {hostname} Alert: {subject}

Description: {message} """ if additional_info: body += f"\nAdditional Information:\n{additional_info}" body += f"""

Alert Details: - Severity Level: {severity} - Generated by: System Monitoring - Host: {hostname} - Timestamp: {timestamp}

Please investigate and take appropriate action.

-- Automated System Monitoring """ return body def send_alert(self, severity, subject, message, additional_info=None): """Send alert email based on severity""" if not self.check_rate_limit(): self.logger.warning("Rate limit exceeded, skipping email") return False # Get recipients for severity level recipients = self.config["recipients"].get(severity, self.config["recipients"].get("medium", [])) if not recipients: self.logger.error(f"No recipients configured for severity: {severity}") return False # Format email email_subject = f"[{severity.upper()}] {subject}" email_body = self.format_email_body(severity, subject, message, additional_info) # Send email return self._send_email(email_subject, email_body, recipients) def _send_email(self, subject, body, recipients): """Send email using SMTP""" try: smtp_config = self.config["smtp"] # Create message msg = MIMEMultipart() msg['From'] = smtp_config["username"] msg['To'] = ", ".join(recipients) msg['Subject'] = subject msg.attach(MIMEText(body, 'plain')) # Connect and send server = smtplib.SMTP(smtp_config["server"], smtp_config["port"]) if smtp_config.get("use_tls", True): server.starttls() server.login(smtp_config["username"], smtp_config["password"]) server.sendmail(smtp_config["username"], recipients, msg.as_string()) server.quit() self.logger.info(f"Alert sent successfully to {recipients}") return True except Exception as e: self.logger.error(f"Failed to send alert: {e}") return False

def main(): if len(sys.argv) < 4: print("Usage: python3 advanced_alert.py [additional_info]") print("Severity levels: critical, high, medium, low, info") sys.exit(1) severity = sys.argv[1].lower() subject = sys.argv[2] message = sys.argv[3] additional_info = sys.argv[4] if len(sys.argv) > 4 else None alert_manager = AlertManager() success = alert_manager.send_alert(severity, subject, message, additional_info) sys.exit(0 if success else 1)

if __name__ == "__main__": main() `

Configuration File Example

Create /etc/monitoring/alert_config.json:

`json { "smtp": { "server": "smtp.gmail.com", "port": 587, "username": "monitoring@yourdomain.com", "password": "your-app-password", "use_tls": true }, "recipients": { "critical": [ "admin@yourdomain.com", "oncall@yourdomain.com", "manager@yourdomain.com" ], "high": [ "admin@yourdomain.com", "oncall@yourdomain.com" ], "medium": [ "admin@yourdomain.com" ], "low": [ "admin@yourdomain.com" ], "info": [ "logs@yourdomain.com" ] }, "rate_limiting": { "enabled": true, "max_emails_per_hour": 20, "cooldown_period": 300 }, "email_templates": { "critical": { "subject_prefix": "[CRITICAL ALERT]", "priority": "high" }, "high": { "subject_prefix": "[HIGH ALERT]", "priority": "high" }, "medium": { "subject_prefix": "[MEDIUM ALERT]", "priority": "normal" }, "low": { "subject_prefix": "[LOW ALERT]", "priority": "low" }, "info": { "subject_prefix": "[INFO]", "priority": "low" } } } `

Script-Based Monitoring

Comprehensive System Monitor

`bash #!/bin/bash

File: /usr/local/bin/comprehensive_monitor.sh

Configuration

SCRIPT_DIR="/usr/local/bin" CONFIG_DIR="/etc/monitoring" LOG_DIR="/var/log/monitoring" ALERT_SCRIPT="$SCRIPT_DIR/advanced_alert.py"

Create directories if they don't exist

mkdir -p "$LOG_DIR"

Logging function

log_message() { echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "$LOG_DIR/system_monitor.log" }

Network connectivity check

check_network() { local hosts=("8.8.8.8" "google.com" "github.com") local failed_hosts=() for host in "${hosts[@]}"; do if ! ping -c 1 -W 5 "$host" > /dev/null 2>&1; then failed_hosts+=("$host") fi done if [ ${#failed_hosts[@]} -gt 0 ]; then python3 "$ALERT_SCRIPT" "high" "Network Connectivity Issues" \ "Failed to reach: ${failed_hosts[*]}" \ "Network connectivity check failed for multiple hosts" fi }

SSL certificate expiry check

check_ssl_certificates() { local domains=("yourdomain.com" "api.yourdomain.com") for domain in "${domains[@]}"; do expiry_date=$(echo | openssl s_client -servername "$domain" -connect "$domain:443" 2>/dev/null | \ openssl x509 -noout -dates | grep notAfter | cut -d= -f2) if [ -n "$expiry_date" ]; then expiry_timestamp=$(date -d "$expiry_date" +%s) current_timestamp=$(date +%s) days_until_expiry=$(( (expiry_timestamp - current_timestamp) / 86400 )) if [ "$days_until_expiry" -lt 30 ]; then severity="high" [ "$days_until_expiry" -lt 7 ] && severity="critical" python3 "$ALERT_SCRIPT" "$severity" "SSL Certificate Expiring" \ "SSL certificate for $domain expires in $days_until_expiry days" \ "Certificate expiry date: $expiry_date" fi fi done }

Database connectivity check

check_database() { local databases=("mysql" "postgresql") for db in "${databases[@]}"; do case "$db" in "mysql") if command -v mysql > /dev/null; then if ! mysql -u monitoring -p"$MYSQL_PASSWORD" -e "SELECT 1;" > /dev/null 2>&1; then python3 "$ALERT_SCRIPT" "critical" "MySQL Database Connection Failed" \ "Unable to connect to MySQL database" \ "Database service may be down or credentials invalid" fi fi ;; "postgresql") if command -v psql > /dev/null; then if ! PGPASSWORD="$POSTGRES_PASSWORD" psql -U monitoring -d postgres -c "SELECT 1;" > /dev/null 2>&1; then python3 "$ALERT_SCRIPT" "critical" "PostgreSQL Database Connection Failed" \ "Unable to connect to PostgreSQL database" \ "Database service may be down or credentials invalid" fi fi ;; esac done }

Log file monitoring

monitor_log_files() { local log_files=("/var/log/auth.log" "/var/log/syslog" "/var/log/apache2/error.log") local error_patterns=("Failed password" "authentication failure" "Internal Server Error") for log_file in "${log_files[@]}"; do if [ -f "$log_file" ]; then for pattern in "${error_patterns[@]}"; do # Check for errors in the last 5 minutes recent_errors=$(grep "$pattern" "$log_file" | \ awk -v cutoff="$(date -d '5 minutes ago' '+%b %d %H:%M')" \ '$0 > cutoff' | wc -l) if [ "$recent_errors" -gt 10 ]; then python3 "$ALERT_SCRIPT" "medium" "High Error Rate in Logs" \ "Found $recent_errors occurrences of '$pattern' in $log_file in the last 5 minutes" \ "Recent error pattern detected" fi done fi done }

Process monitoring

monitor_processes() { local critical_processes=("sshd" "systemd" "init") local important_processes=("apache2" "nginx" "mysql" "postgresql") for process in "${critical_processes[@]}"; do if ! pgrep "$process" > /dev/null; then python3 "$ALERT_SCRIPT" "critical" "Critical Process Not Running" \ "Critical process $process is not running" \ "System stability may be compromised" fi done for process in "${important_processes[@]}"; do if ! pgrep "$process" > /dev/null; then python3 "$ALERT_SCRIPT" "high" "Important Process Not Running" \ "Important process $process is not running" \ "Service may be unavailable" fi done }

Main execution

main() { log_message "Starting comprehensive system monitoring" # Load environment variables for database passwords if [ -f "$CONFIG_DIR/monitoring.env" ]; then source "$CONFIG_DIR/monitoring.env" fi # Run all checks check_network check_ssl_certificates check_database monitor_log_files monitor_processes log_message "Comprehensive system monitoring completed" }

Execute main function

main "$@" `

Cron Job Setup

Set up automated monitoring with cron:

`bash

Edit crontab

crontab -e

Add monitoring jobs

Run comprehensive monitoring every 5 minutes

/5 * /usr/local/bin/comprehensive_monitor.sh

Run basic system monitoring every minute

* /usr/local/bin/system_monitor.sh

Run daily system health report

0 8 * /usr/local/bin/daily_health_report.sh

Run weekly system summary

0 9 1 /usr/local/bin/weekly_summary.sh `

Advanced Configuration

Email Template System

Create customizable email templates:

`python #!/usr/bin/env python3

File: /usr/local/bin/template_manager.py

import json import os from string import Template from datetime import datetime

class EmailTemplateManager: def __init__(self, template_dir="/etc/monitoring/templates"): self.template_dir = template_dir self.templates = self.load_templates() def load_templates(self): """Load email templates from files""" templates = {} if not os.path.exists(self.template_dir): os.makedirs(self.template_dir) self.create_default_templates() for filename in os.listdir(self.template_dir): if filename.endswith('.template'): template_name = filename[:-9] # Remove .template extension with open(os.path.join(self.template_dir, filename), 'r') as f: templates[template_name] = f.read() return templates def create_default_templates(self): """Create default email templates""" templates = { 'critical_alert': '''Subject: [CRITICAL] $alert_type on $hostname

CRITICAL SYSTEM ALERT

Time: $timestamp Host: $hostname Alert Type: $alert_type Severity: CRITICAL

Issue Description: $description

Current Status: $current_status

Immediate Action Required: $recommended_action

System Details: - Hostname: $hostname - IP Address: $ip_address - Operating System: $os_info - Uptime: $uptime

This is a critical alert requiring immediate attention.

-- Automated Monitoring System''',

'service_down': '''Subject: [HIGH] Service $service_name is DOWN on $hostname

SERVICE UNAVAILABLE ALERT

Time: $timestamp Host: $hostname Service: $service_name Status: DOWN Duration: $downtime_duration

Service Details: - Service Name: $service_name - Expected Status: Running - Current Status: Stopped/Failed - Last Known Good: $last_good_time

Impact Assessment: $impact_description

Recommended Actions: 1. Check service logs: journalctl -u $service_name 2. Attempt service restart: systemctl restart $service_name 3. Verify service configuration 4. Check system resources

-- Service Monitoring System''',

'performance_degraded': '''Subject: [MEDIUM] Performance Alert on $hostname

PERFORMANCE DEGRADATION DETECTED

Time: $timestamp Host: $hostname Metric: $metric_name Current Value: $current_value Threshold: $threshold_value

Performance Metrics: $performance_details

Trend Analysis: $trend_information

Suggested Actions: $suggested_actions

-- Performance Monitoring System''' } for name, content in templates.items(): with open(os.path.join(self.template_dir, f"{name}.template"), 'w') as f: f.write(content) def render_template(self, template_name, variables): """Render template with provided variables""" if template_name not in self.templates: raise ValueError(f"Template '{template_name}' not found") template = Template(self.templates[template_name]) # Add common variables common_vars = { 'timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'), 'hostname': os.uname().nodename, 'ip_address': self.get_ip_address(), 'os_info': self.get_os_info(), 'uptime': self.get_uptime() } # Merge with provided variables all_vars = {common_vars, variables} return template.safe_substitute(all_vars) def get_ip_address(self): """Get system IP address""" try: import socket s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) s.connect(("8.8.8.8", 80)) ip = s.getsockname()[0] s.close() return ip except: return "Unknown" def get_os_info(self): """Get OS information""" try: with open('/etc/os-release', 'r') as f: for line in f: if line.startswith('PRETTY_NAME='): return line.split('=')[1].strip().strip('"') except: pass return os.uname().sysname def get_uptime(self): """Get system uptime""" try: with open('/proc/uptime', 'r') as f: uptime_seconds = float(f.readline().split()[0]) days = int(uptime_seconds // 86400) hours = int((uptime_seconds % 86400) // 3600) minutes = int((uptime_seconds % 3600) // 60) return f"{days}d {hours}h {minutes}m" except: return "Unknown" `

Dashboard Integration

Create a simple web dashboard for monitoring status:

`python #!/usr/bin/env python3

File: /usr/local/bin/monitoring_dashboard.py

from flask import Flask, render_template, jsonify import json import os import subprocess from datetime import datetime, timedelta

app = Flask(__name__)

class MonitoringDashboard: def __init__(self): self.status_file = "/var/log/monitoring/status.json" self.alert_log = "/var/log/monitoring_alerts.log" def get_system_status(self): """Get current system status""" try: with open(self.status_file, 'r') as f: return json.load(f) except: return self.collect_system_status() def collect_system_status(self): """Collect current system status""" status = { 'timestamp': datetime.now().isoformat(), 'hostname': os.uname().nodename, 'cpu_usage': self.get_cpu_usage(), 'memory_usage': self.get_memory_usage(), 'disk_usage': self.get_disk_usage(), 'load_average': self.get_load_average(), 'services': self.get_service_status(), 'alerts': self.get_recent_alerts() } # Save status os.makedirs(os.path.dirname(self.status_file), exist_ok=True) with open(self.status_file, 'w') as f: json.dump(status, f, indent=2) return status def get_cpu_usage(self): """Get CPU usage percentage""" try: result = subprocess.run(['top', '-bn1'], capture_output=True, text=True) for line in result.stdout.split('\n'): if 'Cpu(s)' in line: return float(line.split()[1].rstrip('%us,')) except: return 0.0 def get_memory_usage(self): """Get memory usage percentage""" try: result = subprocess.run(['free'], capture_output=True, text=True) lines = result.stdout.split('\n') mem_line = lines[1].split() total = int(mem_line[1]) used = int(mem_line[2]) return round((used / total) * 100, 2) except: return 0.0 def get_disk_usage(self): """Get disk usage for all mounted filesystems""" try: result = subprocess.run(['df', '-h'], capture_output=True, text=True) disk_info = [] for line in result.stdout.split('\n')[1:]: if line.strip() and not line.startswith('tmpfs'): parts = line.split() if len(parts) >= 6: disk_info.append({ 'filesystem': parts[0], 'size': parts[1], 'used': parts[2], 'available': parts[3], 'usage_percent': int(parts[4].rstrip('%')), 'mount_point': parts[5] }) return disk_info except: return [] def get_load_average(self): """Get system load average""" try: with open('/proc/loadavg', 'r') as f: loads = f.read().split()[:3] return [float(load) for load in loads] except: return [0.0, 0.0, 0.0] def get_service_status(self): """Get status of important services""" services = ['ssh', 'apache2', 'nginx', 'mysql', 'postgresql'] status = {} for service in services: try: result = subprocess.run(['systemctl', 'is-active', service], capture_output=True, text=True) status[service] = result.stdout.strip() except: status[service] = 'unknown' return status def get_recent_alerts(self): """Get recent alerts from log file""" alerts = [] try: if os.path.exists(self.alert_log): with open(self.alert_log, 'r') as f: lines = f.readlines()[-50:] # Last 50 lines for line in lines: if 'Alert sent successfully' in line: alerts.append({ 'timestamp': line.split(' - ')[0], 'message': line.strip() }) except: pass return alerts

dashboard = MonitoringDashboard()

@app.route('/') def index(): """Main dashboard page""" status = dashboard.get_system_status() return render_template('dashboard.html', status=status)

@app.route('/api/status') def api_status(): """API endpoint for system status""" return jsonify(dashboard.collect_system_status())

@app.route('/api/refresh') def api_refresh(): """Force refresh system status""" return jsonify(dashboard.collect_system_status())

if __name__ == '__main__': app.run(host='0.0.0.0', port=5000, debug=False) `

Troubleshooting

Common Issues and Solutions

| Issue | Symptoms | Solution | |-------|----------|----------| | Emails not sending | No alert emails received | Check SMTP configuration, credentials, network connectivity | | Authentication failures | SMTP auth errors | Use app passwords for Gmail, verify credentials | | Rate limiting | Some emails missing | Implement proper rate limiting, check provider limits | | False positives | Too many alerts | Adjust thresholds, implement alert correlation | | Template errors | Malformed emails | Validate template syntax, check variable substitution |

Debugging Commands

Test email functionality:

`bash

Test basic mail command

echo "Test message" | mail -s "Test Subject" user@domain.com

Test SMTP connectivity

telnet smtp.gmail.com 587

Check mail queue

mailq

View mail logs

tail -f /var/log/mail.log

Test Python SMTP

python3 -c " import smtplib server = smtplib.SMTP('smtp.gmail.com', 587) server.starttls() print('SMTP connection successful') server.quit() " `

Log Analysis

Monitor email alert logs:

`bash

Create log monitoring script

#!/bin/bash

File: /usr/local/bin/monitor_alert_logs.sh

ALERT_LOG="/var/log/monitoring_alerts.log" MAIL_LOG="/var/log/mail.log"

echo "=== Recent Alert Attempts ===" tail -20 "$ALERT_LOG"

echo -e "\n=== Mail Server Activity ===" tail -20 "$MAIL_LOG" | grep -E "(sent|delivered|failed|error)"

echo -e "\n=== Alert Statistics (Last 24 hours) ===" grep "$(date -d '1 day ago' '+%Y-%m-%d')" "$ALERT_LOG" | \ grep -c "Alert sent successfully"

echo -e "\n=== Failed Alerts (Last 24 hours) ===" grep "$(date -d '1 day ago' '+%Y-%m-%d')" "$ALERT_LOG" | \ grep "Failed to send" `

Best Practices

Security Considerations

1. Credential Management: Store SMTP credentials securely, use app-specific passwords 2. Access Control: Restrict access to configuration files and scripts 3. Encryption: Use TLS/SSL for SMTP connections 4. Rate Limiting: Implement proper rate limiting to prevent spam 5. Log Security: Protect log files from unauthorized access

Performance Optimization

1. Efficient Monitoring: Avoid excessive system calls in monitoring scripts 2. Batch Processing: Group related alerts to reduce email volume 3. Caching: Cache system status to reduce redundant checks 4. Asynchronous Processing: Use background processes for email sending 5. Resource Limits: Set appropriate limits on monitoring frequency

Maintenance Procedures

1. Regular Testing: Test email functionality regularly 2. Log Rotation: Implement proper log rotation for monitoring logs 3. Configuration Backup: Backup monitoring configurations 4. Alert Review: Regularly review and tune alert thresholds 5. Documentation: Maintain up-to-date documentation for procedures

Alert Fatigue Prevention

1. Intelligent Grouping: Group related alerts together 2. Escalation Policies: Implement proper escalation procedures 3. Acknowledgment System: Allow operators to acknowledge alerts 4. Severity Tuning: Regularly adjust alert severity levels 5. Noise Reduction: Filter out non-actionable alerts

This comprehensive guide provides a complete framework for integrating email alerts with system monitoring, covering everything from basic setup to advanced configurations and best practices for maintaining a robust monitoring system.

Integrating Email Alerts with System Monitoring

Table of Contents

Introduction

Prerequisites

Email Configuration Methods

Method 1: Using Postfix (Local Mail Server)

Install Postfix

Configure Postfix for Gmail relay

Create password file

Add credentials (replace with actual values)

Secure and hash the file

Restart Postfix

Method 2: Using SSMTP (Lightweight Alternative)

Install SSMTP

Configure SSMTP

Method 3: Using Python SMTP Library

Monitoring Tools Integration

Nagios Integration

Zabbix Integration

Custom Script Integration

System monitoring script with email alerts

File: /usr/local/bin/system_monitor.sh

Configuration

Thresholds

Function to send alert

Check CPU usage

Check memory usage

Check disk usage

Check system load

Check service status

Main execution

Alert Configuration

Alert Severity Levels

Advanced Alert Script

File: /usr/local/bin/advanced_alert.py

Configuration File Example

Script-Based Monitoring

Comprehensive System Monitor

File: /usr/local/bin/comprehensive_monitor.sh

Configuration

Create directories if they don't exist

Logging function

Network connectivity check

SSL certificate expiry check

Database connectivity check

Log file monitoring

Process monitoring

Main execution

Execute main function

Cron Job Setup

Edit crontab

Add monitoring jobs

Run comprehensive monitoring every 5 minutes

Run basic system monitoring every minute

Run daily system health report

Run weekly system summary

Advanced Configuration

Email Template System

File: /usr/local/bin/template_manager.py

Dashboard Integration

File: /usr/local/bin/monitoring_dashboard.py

Troubleshooting

Common Issues and Solutions

Debugging Commands

Test basic mail command

Test SMTP connectivity

Check mail queue

View mail logs

Test Python SMTP

Log Analysis

Create log monitoring script

File: /usr/local/bin/monitor_alert_logs.sh

Best Practices

Security Considerations

Performance Optimization

Maintenance Procedures

Alert Fatigue Prevention

Tags

Related Articles

Popular Technical Articles & Tutorials