Course Outline
Introduction
- Introduction to Cloud Computing and Big Data solutions.
- Overview of Apache Hadoop Features and Architecture.
Setting up Hadoop
- Planning a Hadoop cluster (on-premise, cloud, etc.).
- Selecting the OS and Hadoop distribution.
- Provisioning resources (hardware, network, etc.).
- Downloading and installing the software.
- Sizing the cluster for flexibility.
Working with HDFS
- Understanding the Hadoop Distributed File System (HDFS).
- Overview of HDFS Command Reference.
- Accessing HDFS.
- Performing Basic File Operations on HDFS.
- Using S3 as a complement to HDFS.
Overview of the MapReduce
- Understanding Data Flow in the MapReduce Framework.
- Map, Shuffle, Sort, and Reduce.
- Demo: Computing Top Salaries.
Working with YARN
- Understanding resource management in Hadoop.
- Working with ResourceManager, NodeManager, and Application Master.
- Scheduling jobs under YARN.
- Scheduling for large numbers of nodes and clusters.
- Demo: Job scheduling.
Integrating Hadoop with Spark
- Setting up storage for Spark (HDFS, Amazon S3, NoSQL, etc.).
- Understanding Resilient Distributed Datasets (RDDs).
- Creating an RDD.
- Implementing RDD Transformations.
- Demo: Implementing a Text Search Program for Movie Titles.
Managing a Hadoop Cluster
- Monitoring Hadoop.
- Securing a Hadoop cluster.
- Adding and removing nodes.
- Running a performance benchmark.
- Tuning a Hadoop cluster to optimize performance.
- Backup, recovery, and business continuity planning.
- Ensuring high availability (HA).
Upgrading and Migrating a Hadoop Cluster
- Assessing workload requirements.
- Upgrading Hadoop.
- Moving from on-premise to cloud and vice-versa.
- Recovering from failures.
Troubleshooting
Summary and Conclusion
Requirements
- Experience in system administration.
- Familiarity with the Linux command line.
- Understanding of big data concepts.
Audience
- System administrators.
- Database Administrators (DBAs).
Testimonials (3)
I liked that it was practical. Loved to apply the theoretical knowledge with practical examples.
Aurelia-Adriana - Allianz Services Romania
Course - Python and Spark for Big Data (PySpark)
The fact that we were able to take with us most of the information/course/presentation/exercises done, so that we can look over them and perhaps redo what we didint understand first time or improve what we already did.
Raul Mihail Rat - Accenture Industrial SS
Course - Python, Spark, and Hadoop for Big Data
The combination of theory and practice with tools like Databricks
Graciela Saud - Servicio de Impuestos Internos
Course - Spark for Developers
Machine Translated