Chapter 5: Cloud Providers (AWS, Azure, GCP)
Part 1: The "Why": What is Cloud Computing?
In the previous chapters, we've learned how to write scripts, manage a Linux server, and build CI/CD pipelines. But *where* do these servers live? Ten years ago, the answer was an "on-premise" (on-prem) data center. This means your company would buy physical server hardware, put it in a cold room in your office, manage the electricity, and manage the internet connection.
This was slow, expensive, and didn't scale. If your website suddenly got 10 million users, you couldn't just "buy 1,000 new servers" instantly. You'd have to order them, wait weeks for delivery, and manually install them.
The Cloud Revolution
**Cloud Computing** (like **Amazon Web Services (AWS)**) solves this. A cloud provider is simply a company that has *already* built massive, global data centers, and they rent you their servers by the second.
Instead of buying a physical server, you make an API call (or click a button) and a new virtual server is ready for you in 30 seconds. When you're done, you make another API call to delete it, and you stop paying.
This is the foundation of DevOps. The cloud provides **on-demand, self-service, pay-as-you-go infrastructure**, which is the perfect platform for automation.
The 3 Main Service Models (IaaS, PaaS, SaaS)
"Cloud" is a general term. It's broken down into three main service models. The most famous analogy is the "Pizza as a Service" model.
1. IaaS (Infrastructure as a Service)
- The Analogy: You buy the ingredients (flour, sauce, cheese) and use the restaurant's oven.
- The Explanation: The cloud provider gives you the *raw infrastructure* (virtual servers, storage, networking). You are responsible for *everything* else: managing the OS, installing security patches, installing your database, and running your app.
- This is the primary domain of a DevOps Engineer.
- Example Services: **AWS EC2**, Azure Virtual Machines, Google Compute Engine.
2. PaaS (Platform as a Service)
- The Analogy: You just order the pizza. The restaurant handles the ingredients, oven, and cooking. You just set the table.
- The Explanation: The cloud provider manages the OS, the patching, and the runtime (e.g., the Node.js or Python environment). You *only* upload your code (e.g., your
app.jsfile) and the platform runs it for you. - Pros: Much easier than IaaS. No server management.
- Cons: Less flexible. You can't SSH into the server to install a custom tool.
- Example Services: **Heroku**, AWS Elastic Beanstalk, Google App Engine.
3. SaaS (Software as a Service)
- The Analogy: You go to a restaurant, sit down, and eat. You manage nothing.
- The Explanation: You are just a *user* of the software. The provider manages everything.
- Example Services: **Gmail**, **GitHub**, **Firebase**, Office 365, Dropbox.
The 3 Deployment Models (Public, Private, Hybrid)
- Public Cloud (AWS, GCP, Azure): The most common. You share the data center with other companies (this is called "multi-tenancy"). It's the cheapest and most scalable.
- Private Cloud: A private data center that a company builds *for itself*, but it runs on cloud principles (automation, self-service). Used by banks or governments with extreme security needs.
- Hybrid Cloud: A mix of both. A company might run its public website on AWS (Public Cloud) but keep its sensitive user database on-premise (Private Cloud) and connect the two.
Part 2: The "Big 3" Cloud Providers
While hundreds of cloud providers exist, the market is dominated by three giants. As a DevOps engineer, you must be familiar with at least one, and ideally all three.
- Amazon Web Services (AWS): The undisputed market leader (around 32% market share). It has the most services, the largest community, and is the most widely used by startups and enterprises. We will focus on AWS for our deep dive.
- Microsoft Azure: The clear #2 (around 22% market share). It is extremely popular in large corporations that already use a lot of Microsoft products (like Windows Server, .NET, and SQL Server).
- Google Cloud Platform (GCP): The #3 (around 11% market share). It is famous for its data engineering, machine learning, and, most importantly, **Kubernetes**. Google *invented* Kubernetes (GKE - Google Kubernetes Engine) and is considered the leader in that specific area.
Part 3: Deep Dive into AWS (The Market Leader)
AWS has over 200 services. You don't need to know them all. A DevOps engineer needs to master the "core" services related to compute, storage, networking, and identity.
Core Concepts: Regions, AZs, and IAM
- Region: A physical, geographic location in the world (e.g.,
us-east-1in N. Virginia,ap-south-1in Mumbai). - Availability Zone (AZ):** A single data center (or multiple data centers) *within* a Region. Each Region has at least 2-3 AZs.
Why? For **High Availability**. You should run your app in *at least two* AZs. If one data center (AZ-1) has a power failure, your app will keep running in AZ-2, and your users will never know.
AWS IAM: Identity & Access Management (CRITICAL)
This is the **most important** security service. IAM answers the question: "**Who** can do **what**?"
- Users:** An end-user (a person, e.g., "msmaxpro"). You log in with a user. **Best Practice:** You *never* use your "root" (main) account. You create a new IAM user for yourself with limited permissions.
- Groups:** A collection of users (e.g., "Developers," "Admins," "Testers"). You apply permissions to the *Group*, not the User.
- Roles:** This is for *services*. You create a "Role" that an AWS service (like an EC2 server) can "assume" (wear like a hat) to get temporary permissions.
**Example:** You give your EC2 server a "S3_Access_Role" so it can write to S3, without you *ever* having to hard-code an API key into your server. This is the secure way. - Policies:** The "rulebook." A JSON document that defines *what* is allowed or denied.
Example: An IAM Policy (JSON)
// This policy allows a user to READ (Get, List)
// from the "my-app-backups" S3 bucket, but nothing else.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-app-backups",
"arn:aws:s3:::my-app-backups/*"
]
}
]
}
AWS Compute Services
1. EC2 (Elastic Compute Cloud) - (IaaS)
This is the "virtual server" service. It's the #1 service you will use. It lets you rent virtual machines (called "instances") by the second.
- Instance Types: The "size" of your server (e.g.,
t2.micro- a small, cheap server for testing;m5.large- a powerful server for production). - AMI (Amazon Machine Image):** The "template" for your server. You choose an AMI to start with (e.g., "Ubuntu 22.04", "RHEL 9", "Windows Server 2022").
- Security Groups:** This is the **firewall** for your server. It's a list of rules defining what "ports" are open.
**Example:** A security group for a web server would have these rules:- Allow TCP on Port 22 (SSH) from *My IP only*.
- Allow TCP on Port 80 (HTTP) from *Anywhere* (0.0.0.0/0).
- Allow TCP on Port 443 (HTTPS) from *Anywhere* (0.0.0.0/0).
- SSH Key Pair:** How you log in. When you create an EC2 instance, you select your SSH public key. This is the *only* way to log in (password login is disabled by default, which is good practice).
2. Lambda (Serverless) - (FaaS)
**Serverless** (or FaaS - Function as a Service) is the next evolution. What if you just need a *function* to run, not a whole server?
Lambda lets you upload a single function (e.g., a Python script) and it will run it for you. You are billed *only* for the 100ms it takes to run. You manage *nothing*. No OS, no patches, no servers.
Use Case:** An API endpoint that runs when a user clicks a button, or a function that runs automatically every time a new image is uploaded to S3 (e.g., to create a thumbnail).
AWS Storage Services
1. S3 (Simple Storage Service) - Object Storage
S3 is an infinitely scalable storage service for "objects" (files). It is *not* a hard drive. You can't install an OS on it. You use it to store and serve files via HTTP.
This is where you store your website's static files (images, CSS), your log files, your database backups, and your Docker images.
- Bucket:** A "folder" with a globally unique name (e.g.,
codewithmsmaxpro-backups). - Object:** The file you upload (e.g.,
my-backup.zip).
2. EBS (Elastic Block Store) - Block Storage
This is the **virtual hard drive** for your EC2 instance. When you create an EC2 server, you attach an EBS volume to it. This is where your Linux OS is installed.
AWS Networking Services
1. VPC (Virtual Private Cloud)
This is your **private, isolated network** inside AWS. It's a virtual version of your office network. You define your own IP range (e.g., 10.0.0.0/16) and create "subnets" inside it.
2. Public vs. Private Subnets
This is the foundation of cloud security.
- Public Subnet:** A subnet that has a "route" to the internet (via an "Internet Gateway"). Your **web servers** and **load balancers** live here.
- Private Subnet:** A subnet that is **100% private** and cannot be accessed from the internet. Your **databases** and other secure services live here.
This is how you protect your database. A user can access your web server, but your web server is the *only* thing that can access your database.
3. ELB (Elastic Load Balancer)
What if your website gets 10 million users? One EC2 server can't handle it. So you run 10 identical EC2 servers. But how do users find them?
An **ELB (Application Load Balancer - ALB)** sits in front of your 10 servers. It has *one* public DNS name. All users go to the ALB, and it **distributes (balances) the traffic** across all your healthy servers.
4. Route 53
This is AWS's global DNS service. This is where you would host your codewithmsmaxpro.me domain and create your 'A', 'CNAME', and 'MX' records (as we learned in Chapter 2.2).
Part 4: Azure & GCP (The Other Giants)
Once you understand the *concepts* from AWS, learning Azure and GCP is just learning new *names* for the same services.
Service Name Translation
| Concept | AWS (Amazon) | Azure (Microsoft) | GCP (Google) |
|------------------------|-----------------------------|---------------------------------|-----------------------------|
| Virtual Server (IaaS) | EC2 (Elastic Compute Cloud) | Azure Virtual Machines | GCE (Google Compute Engine) |
| Object Storage | S3 (Simple Storage Service) | Blob Storage | Google Cloud Storage (GCS) |
| Virtual Network | VPC (Virtual Private Cloud) | VNet (Virtual Network) | VPC (Virtual Private Cloud) |
| Serverless Function | Lambda | Azure Functions | Google Cloud Functions |
| Relational Database | RDS (Relational DB Service) | Azure SQL Database | Cloud SQL |
| NoSQL Database | DynamoDB | Cosmos DB | Firestore / Bigtable |
| DNS Service | Route 53 | Azure DNS | Cloud DNS |
| Kubernetes | EKS (Elastic K8s Service) | AKS (Azure K8s Service) | GKE (Google K8s Engine) |
| Docker Registry | ECR (Elastic Container Registry) | ACR (Azure Container Registry) | GCR (Google Container Registry) |
| Identity / Auth | IAM | Entra ID (was Azure AD) | Cloud IAM |
Part 5: Infrastructure as Code (IaC) - The DevOps Way
You *can* log in to the AWS Console (the website) and click buttons to create all your servers, VPCs, and databases. This is called **"ClickOps"**. It's fine for learning, but it's **terrible** for production.
Why? It's not repeatable, it's not documented, and it's error-prone. What if you forget to set a firewall rule?
The DevOps solution is **Infrastructure as Code (IaC)**. You write *code* that defines your infrastructure. This code is your new blueprint. You run this code, and it *automatically* builds your entire AWS environment.
The #1 tool for this is **Terraform**.
Terraform (by HashiCorp)
Terraform is a tool that lets you define your cloud infrastructure in a simple, declarative language called **HCL (HashiCorp Configuration Language)**. It is "cloud-agnostic," meaning you can use it to manage AWS, Azure, GCP, and more, all with the same tool.
Example: Creating an AWS EC2 Server with Terraform
Create a file named main.tf.
# 1. Configure the AWS Provider
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = "us-east-1"
}
# 2. Define a variable
variable "instance_type" {
description = "The type of EC2 instance"
type = string
default = "t2.micro"
}
# 3. Define a resource (the server)
resource "aws_instance" "my_web_server" {
ami = "ami-0c7217cdde317cfec" # Ubuntu 22.04 AMI in us-east-1
instance_type = var.instance_type
tags = {
Name = "My-Web-Server-01"
}
}
Now, you just run three commands in your terminal:
# 1. Initialize the project (downloads the AWS provider)
$ terraform init
# 2. See what Terraform *plans* to do
$ terraform plan
Terraform will perform the following actions:
# aws_instance.my_web_server will be created
+ resource "aws_instance" "my_web_server" { ... }
Plan: 1 to add, 0 to change, 0 to destroy.
# 3. Apply the changes
$ terraform apply
Do you want to perform these actions?
Only 'yes' will be accepted to approve.
Enter a value: yes
aws_instance.my_web_server: Creating...
aws_instance.my_web_server: Creation complete after 30s [id=i-1234567890abcdef]
That's it! Your server is now running on AWS. When you are done, you run terraform destroy to delete it. This is the power of IaC.