What is ETL (Extract, Transform, Load)?
A data pipeline process that extracts data from sources, transforms it into a suitable format, and loads it into a destination system.
ETL processes move data between systems. Extract pulls data from databases, APIs, files, or streams. Transform cleans, validates, enriches, and restructures the data. Load writes the processed data to a data warehouse or target system.
Modern variations include ELT (load raw data first, transform in the warehouse) and real-time streaming. Tools include Apache Airflow, dbt (transform layer), Apache Spark, and cloud services like AWS Glue. ETL is fundamental to data warehousing and analytics.