Get in Touch

Course Outline

Introduction to Apache Airflow

  • Defining workflow orchestration.
  • Key features and benefits of Apache Airflow.
  • Overview of improvements and ecosystem in Airflow 2.x.

Architecture and Core Concepts

  • Roles of the scheduler, web server, and worker processes.
  • Understanding DAGs, tasks, and operators.
  • Executors and backends (Local, Celery, Kubernetes).

Installation and Setup

  • Installing Airflow in local and cloud environments.
  • Configuring Airflow with various executors.
  • Setting up metadata databases and connections.

Navigating the Airflow UI and CLI

  • Exploring the Airflow web interface.
  • Monitoring DAG runs, tasks, and logs.
  • Utilizing the Airflow CLI for administrative tasks.

Authoring and Managing DAGs

  • Creating DAGs using the TaskFlow API.
  • Leveraging operators, sensors, and hooks.
  • Managing dependencies and scheduling intervals.

Integrating Airflow with Data and Cloud Services

  • Connecting to databases, APIs, and message queues.
  • Executing ETL pipelines with Airflow.
  • Cloud integrations: Operators for AWS, GCP, and Azure.

Monitoring and Observability

  • Accessing task logs and real-time monitoring tools.
  • Tracking metrics with Prometheus and Grafana.
  • Setting up alerting and notifications via email or Slack.

Securing Apache Airflow

  • Implementing role-based access control (RBAC).
  • Configuring authentication with LDAP, OAuth, and SSO.
  • Managing secrets using Vault and cloud secret stores.

Scaling Apache Airflow

  • Managing parallelism, concurrency, and task queues.
  • Utilizing CeleryExecutor and KubernetesExecutor.
  • Deploying Airflow on Kubernetes using Helm.

Best Practices for Production

  • Implementing version control and CI/CD for DAGs.
  • Testing and debugging DAGs effectively.
  • Maintaining reliability and performance at scale.

Troubleshooting and Optimization

  • Debugging failed DAGs and individual tasks.
  • Optimizing DAG performance.
  • Identifying common pitfalls and strategies to avoid them.

Summary and Next Steps

Requirements

  • Prior experience with Python programming.
  • Familiarity with data engineering or DevOps concepts.
  • Understanding of ETL processes or workflow orchestration.

Target Audience

  • Data scientists.
  • Data engineers.
  • DevOps and infrastructure engineers.
  • Software developers.
 21 Hours

Number of participants


Price per participant

Testimonials (7)

Upcoming Courses

Related Categories