Data Engineering using AWS Analytics Services

Sponsored Post November 12, 2021

Data Engineering using AWS Analytics Services, Build Data Engineering Pipelines using AWS Analytics Services such as Glue, EMR, Athena, Kinesis, Quick Sight, etc

Preview this Course GET COUPON CODE

What you'll learn

Data Engineering leveraging AWS Analytics features
Managing Tables using Glue Catalog
Engineering Batch Data Pipelines using Glue Jobs
Orchestrating Batch Data Pipelines using Glue Workflows
Running Queries using Athena - Server less query engine service
Using AWS Elastic Map Reduce (EMR) Clusters for building Data Pipelines
Using AWS Elastic Map Reduce (EMR) Clusters for reports and dashboards
Data Ingestion using Lambda Functions
Scheduling using Events Bridge
Engineering Streaming Pipelines using Kinesis
Streaming Web Server logs using Kinesis Firehose
Overview of data processing using Athena
Running Athena queries or commands using CLI
Running Athena queries using Python boto3
Creating Redshift Cluster, Create tables and perform CRUD Operations
Copy data from s3 to Redshift Tables
Understanding Distribution Styles and creating tables using Distkeys
Running queries on external RDBMS Tables using Redshift Federated Queries
Running queries on Glue or Athena Catalog tables using Redshift Spectrum

Requirements

Programming experience using Python
Data Engineering experience using Spark
Ability to write and interpret SQL Queries
This course is ideal for experienced data engineers to add AWS Analytics Services as key skills to their profile

Description

Data Engineering is all about building Data Pipelines to get data from multiple sources into Data Lake or Data Warehouse and then from Data Lake or Data Warehouse to downstream systems. As part of this course, I will walk you through how to build Data Engineering Pipelines using AWS Analytics Stack. It includes services such as Glue, Elastic Map Reduce (EMR), Lambda Functions, Athena, QuickSight, and many more.

Here are the high-level steps which you will follow as part of the course.

Setup Development Environment

Getting Started with AWS

Development Life Cycle of Pyspark

Overview of Glue Components

Setup Spark History Server for Glue Jobs

Deep Dive into Glue Catalog

Exploring Glue Job APIs

Glue Job Bookmarks

Data Ingestion using Lambda Functions

Streaming Pipeline using Kinesis

Consuming Data from s3 using boto3

Populating GitHub Data to Dynamodb

Getting Started with AWS

Introduction - AWS Getting Started

Create s3 Bucket

Create IAM Group and User

Overview of Roles

Create and Attach Custom Policy

Configure and Validate AWS CLI

Development Lifecycle for Pyspark

Setup Virtual Environment and Install Pyspark

Getting Started with Pycharm

Passing Run Time Arguments

Accessing OS Environment Variables

Getting Started with Spark

Create Function for Spark Session

Setup Sample Data

Read data from files

Process data using Spark APIs

Write data to files

Validating Writing Data to Files

Productionizing the Code

Overview of Glue Components

Introduction - Overview of Glue Components

Create Crawler and Catalog Table

Analyze Data using Athena

Creating S3 Bucket and Role

Create and Run the Glue Job

Validate using Glue CatalogTable and Athena

Create and Run Glue Trigger

Create Glue Workflow

Run Glue Workflow and Validate

Using Athena to run Serverless Queries

Getting Started with Athena

Accessing Glue Catalog Tables using Athena

Create Athena Tables and Populating data into Athena tables

Create Athena Tables using query results using CTAS

Amazon Athena Architecture

Partitioned Tables in Athena

Running Athena Queries and Commands using AWS CLI

Running Athena Queries and Commands using Python boto3

Cloud Data Warehouse using AWS Redshift

Create Redshift Cluster using Free Tier

Setup Databases as part of Redshift Cluster and perform CRUD operations

Copy CSV or delimited data from s3 into Redshift Tables using credentials as well as iam_role

Copy JSON data from s3 into Redshift Tables using iam_role

Who this course is for:

Beginner or Intermediate Data Engineers who want to learn AWS Analytics Services for Data Engineering
Intermediate Application Engineers who want to explore Data Engineering using AWS Analytics Services
Data and Analytics Engineers who want to learn Data Engineering using AWS Analytics Services
Testers who want to learn Databricks to test Data Engineering applications built using AWS Analytics Services

100% Off Udemy Coupon . Free Udemy Courses . Online Classes