Data Engineering using AWS Analytics Services
Sponsored Post
Preview this Course GET COUPON CODE
What you'll learn
- Data Engineering leveraging AWS Analytics features
- Managing Tables using Glue Catalog
- Engineering Batch Data Pipelines using Glue Jobs
- Orchestrating Batch Data Pipelines using Glue Workflows
- Running Queries using Athena - Server less query engine service
- Using AWS Elastic Map Reduce (EMR) Clusters for building Data Pipelines
- Using AWS Elastic Map Reduce (EMR) Clusters for reports and dashboards
- Data Ingestion using Lambda Functions
- Scheduling using Events Bridge
- Engineering Streaming Pipelines using Kinesis
- Streaming Web Server logs using Kinesis Firehose
- Overview of data processing using Athena
- Running Athena queries or commands using CLI
- Running Athena queries using Python boto3
- Creating Redshift Cluster, Create tables and perform CRUD Operations
- Copy data from s3 to Redshift Tables
- Understanding Distribution Styles and creating tables using Distkeys
- Running queries on external RDBMS Tables using Redshift Federated Queries
- Running queries on Glue or Athena Catalog tables using Redshift Spectrum
Requirements
- Programming experience using Python
- Data Engineering experience using Spark
- Ability to write and interpret SQL Queries
- This course is ideal for experienced data engineers to add AWS Analytics Services as key skills to their profile
Description
Data Engineering is all about building Data Pipelines to get data from multiple sources into Data Lake or Data Warehouse and then from Data Lake or Data Warehouse to downstream systems. As part of this course, I will walk you through how to build Data Engineering Pipelines using AWS Analytics Stack. It includes services such as Glue, Elastic Map Reduce (EMR), Lambda Functions, Athena, QuickSight, and many more.
Here are the high-level steps which you will follow as part of the course.
Setup Development Environment
Getting Started with AWS
Development Life Cycle of Pyspark
Overview of Glue Components
Setup Spark History Server for Glue Jobs
Deep Dive into Glue Catalog
Exploring Glue Job APIs
Glue Job Bookmarks
Data Ingestion using Lambda Functions
Streaming Pipeline using Kinesis
Consuming Data from s3 using boto3
Populating GitHub Data to Dynamodb
Getting Started with AWS
Introduction - AWS Getting Started
Create s3 Bucket
Create IAM Group and User
Overview of Roles
Create and Attach Custom Policy
Configure and Validate AWS CLI
Development Lifecycle for Pyspark
Setup Virtual Environment and Install Pyspark
Getting Started with Pycharm
Passing Run Time Arguments
Accessing OS Environment Variables
Getting Started with Spark
Create Function for Spark Session
Setup Sample Data
Read data from files
Process data using Spark APIs
Write data to files
Validating Writing Data to Files
Productionizing the Code
Overview of Glue Components
Introduction - Overview of Glue Components
Create Crawler and Catalog Table
Analyze Data using Athena
Creating S3 Bucket and Role
Create and Run the Glue Job
Validate using Glue CatalogTable and Athena
Create and Run Glue Trigger
Create Glue Workflow
Run Glue Workflow and Validate
Using Athena to run Serverless Queries
Getting Started with Athena
Accessing Glue Catalog Tables using Athena
Create Athena Tables and Populating data into Athena tables
Create Athena Tables using query results using CTAS
Amazon Athena Architecture
Partitioned Tables in Athena
Running Athena Queries and Commands using AWS CLI
Running Athena Queries and Commands using Python boto3
Cloud Data Warehouse using AWS Redshift
Create Redshift Cluster using Free Tier
Setup Databases as part of Redshift Cluster and perform CRUD operations
Copy CSV or delimited data from s3 into Redshift Tables using credentials as well as iam_role
Copy JSON data from s3 into Redshift Tables using iam_role
Who this course is for:
- Beginner or Intermediate Data Engineers who want to learn AWS Analytics Services for Data Engineering
- Intermediate Application Engineers who want to explore Data Engineering using AWS Analytics Services
- Data and Analytics Engineers who want to learn Data Engineering using AWS Analytics Services
- Testers who want to learn Databricks to test Data Engineering applications built using AWS Analytics Services
100% Off Udemy Coupon . Free Udemy Courses . Online Classes