Locale misconfiguration is a low-grade infection that surfaces as garbled accented characters, broken sort orders, mysterious crashes in Python and Ruby, and emails that arrive as =?ISO-8859-1?Q? mojibake. Setting locale correctly is a five-minute job; debugging a half-set locale six months later is a four-hour job. This guide explains the layers (system, shell, service, application), how to verify each, and the UTF-8 settings that should be the default everywhere in 2026.
Anatomy of a locale
A locale is a string like en_US.UTF-8 with three parts: language (en), country (US), and character set (UTF-8). Linux exposes 11 categories that can be set independently: LC_CTYPE, LC_COLLATE, LC_TIME, LC_NUMERIC, LC_MESSAGES, LC_MONETARY, LC_PAPER, LC_NAME, LC_ADDRESS, LC_TELEPHONE, and LC_MEASUREMENT. LC_ALL overrides everything, LANG is the fallback.
locale # what the current shell uses
locale -a # list installed locales
locale charmap # current character set
Generating and selecting locales
On Debian/Ubuntu:
sudo dpkg-reconfigure locales # interactive
# or non-interactive:
sudo sed -i 's/^# *\(en_US.UTF-8 UTF-8\)/\1/' /etc/locale.gen
sudo locale-gen
sudo update-locale LANG=en_US.UTF-8
On RHEL/Fedora/Rocky:
sudo dnf install glibc-langpack-en
sudo localectl set-locale LANG=en_US.UTF-8
Validate everywhere with localectl status; the System Locale block is the persistent value.
Per-user and per-shell overrides
System default is overridden by ~/.pam_environment, then by shell rc files (~/.bashrc, ~/.zshrc), then by LANG=โฆ on the command line. Pin the variables you actually need:
# ~/.bashrc
export LANG=en_US.UTF-8
export LC_ALL=en_US.UTF-8
export LC_TIME=en_GB.UTF-8 # 24-hour clock, day-month-year
Avoid setting LC_ALL globally in production โ it is too blunt. Set LANG and override individual LC_* categories only when needed.
Per-service locale
SSH and cron inherit a different environment than your interactive shell. Two patterns to make services locale-aware:
# /etc/default/locale (Debian) or /etc/locale.conf (RHEL)
LANG=en_US.UTF-8
# In a systemd unit override:
[Service]
Environment=LANG=en_US.UTF-8
Environment=LC_ALL=en_US.UTF-8
For Apache, set SetEnv LANG en_US.UTF-8 in your virtual host. For nginx + FastCGI, pass fastcgi_param LANG en_US.UTF-8;.
The Python and Ruby trap
Python 3 falls back to ASCII when the locale lookup returns POSIX or C, which means any file containing accented characters causes UnicodeDecodeError:
PYTHONIOENCODING=utf-8 python3 -c 'print("cafรฉ")'
Permanent fix: ensure systemd services that exec Python set Environment=LANG=en_US.UTF-8, or use C.UTF-8 which is always available on glibc 2.13+ even without a locale package installed.
Console keyboard and font
Locale settings do not change the console keyboard layout. Configure separately:
sudo localectl set-keymap us
sudo localectl set-x11-keymap us
sudo dpkg-reconfigure keyboard-configuration # Debian/Ubuntu
For non-Latin scripts, install a Unicode-capable console font: setfont /usr/share/consolefonts/Lat15-Fixed16.psf.gz.
Diagnosing mojibake
When characters appear as รยฉ instead of รฉ, the data was encoded as UTF-8 and decoded as Latin-1. Inspect with file -i and convert with iconv:
file -i suspect.txt
iconv -f ISO-8859-1 -t UTF-8 suspect.txt > fixed.txt
# Roundtrip a column in MySQL:
ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
For PostgreSQL, the database charset is set at initdb time and cannot be changed without a dump/restore.
UTF-8 everywhere checklist
localectl statusshows UTF-8 in System Locale.echo $LANGin SSH session, in cron job (* * * * * echo $LANG > /tmp/lang.log), and inside every service unit.- Database collation uses utf8mb4 (MySQL) or UTF8 (Postgres).
- Web server emits
Content-Type: text/html; charset=utf-8. - Mail templates declare
Content-Type: text/plain; charset="UTF-8"with proper Content-Transfer-Encoding.
Common pitfalls
- Installing locales on the host but never restarting the affected service โ locale env is read at process start.
- Mixing
en_USfor messages with a non-UTF-8 charset; always pin.UTF-8. - Forgetting that container images strip locales by default. Use
ENV LANG=C.UTF-8in your Dockerfile.
Locale is one of those topics where ten minutes of upfront discipline saves you from a dozen weird tickets later. Set en_US.UTF-8 (or C.UTF-8) once at the system level, propagate via systemd Environment=, and never accept a service that defaults to POSIX.