Course Outline
Introduction to Apache Airflow
- Defining workflow orchestration.
- Key features and benefits of Apache Airflow.
- Overview of improvements and ecosystem in Airflow 2.x.
Architecture and Core Concepts
- Roles of the scheduler, web server, and worker processes.
- Understanding DAGs, tasks, and operators.
- Executors and backends (Local, Celery, Kubernetes).
Installation and Setup
- Installing Airflow in local and cloud environments.
- Configuring Airflow with various executors.
- Setting up metadata databases and connections.
Navigating the Airflow UI and CLI
- Exploring the Airflow web interface.
- Monitoring DAG runs, tasks, and logs.
- Utilizing the Airflow CLI for administrative tasks.
Authoring and Managing DAGs
- Creating DAGs using the TaskFlow API.
- Leveraging operators, sensors, and hooks.
- Managing dependencies and scheduling intervals.
Integrating Airflow with Data and Cloud Services
- Connecting to databases, APIs, and message queues.
- Executing ETL pipelines with Airflow.
- Cloud integrations: Operators for AWS, GCP, and Azure.
Monitoring and Observability
- Accessing task logs and real-time monitoring tools.
- Tracking metrics with Prometheus and Grafana.
- Setting up alerting and notifications via email or Slack.
Securing Apache Airflow
- Implementing role-based access control (RBAC).
- Configuring authentication with LDAP, OAuth, and SSO.
- Managing secrets using Vault and cloud secret stores.
Scaling Apache Airflow
- Managing parallelism, concurrency, and task queues.
- Utilizing CeleryExecutor and KubernetesExecutor.
- Deploying Airflow on Kubernetes using Helm.
Best Practices for Production
- Implementing version control and CI/CD for DAGs.
- Testing and debugging DAGs effectively.
- Maintaining reliability and performance at scale.
Troubleshooting and Optimization
- Debugging failed DAGs and individual tasks.
- Optimizing DAG performance.
- Identifying common pitfalls and strategies to avoid them.
Summary and Next Steps
Requirements
- Prior experience with Python programming.
- Familiarity with data engineering or DevOps concepts.
- Understanding of ETL processes or workflow orchestration.
Target Audience
- Data scientists.
- Data engineers.
- DevOps and infrastructure engineers.
- Software developers.
Testimonials (7)
The instructor adapted the training to the participants’ level and responded to all questions. He was very communicative, and it was easy to interact with him. I really appreciated the format of the training, which included many practical exercises. Overall, it was a very engaging and well-organized session.
Jacek Chlopik - ZAKLAD UBEZPIECZEN SPOLECZNYCH
Course - Apache Airflow: Building and Managing Data Pipelines
The training was spot on. Very useful theory and exercices.
Vladimir - PUBLIC COURSE
Course - Apache Airflow
The training was spot on in all aspects. Usefull theoretical aspects and exercises.
Vladimir - PUBLIC COURSE
Course - Apache Airflow
The training was spot on in all aspects. Usefull theoretical aspects and exercises.
Vladimir - PUBLIC COURSE
Course - Apache Airflow
The training was spot on in all aspects. Usefull theoretical aspects and exercises.
Vladimir - PUBLIC COURSE
Course - Apache Airflow
The training was spot on in all aspects. Usefull theoretical aspects and exercises.
Vladimir - PUBLIC COURSE
Course - Apache Airflow
The training was spot on in all aspects. Usefull theoretical aspects and exercises.