Chapter 4: CI/CD Pipelines
The "Why": What is CI/CD?
In the previous chapters, we learned the core components of DevOps: scripting, the OS, networking, and Git. Now, we assemble them. **CI/CD** is the *process* that uses all those skills to automate the delivery of software.
CI/CD stands for **Continuous Integration** and **Continuous Delivery** (or **Continuous Deployment**). It is the heart of the DevOps culture.
What is Continuous Integration (CI)?
CI is the practice of developers "integrating" (merging) their code changes into the main main branch as often as possible (multiple times a day).
But how do you do this safely? You automate it. **Every time a developer pushes code to a new branch, a CI server automatically does the following:**
- Builds the code (e.g., runs
npm run buildor./gradlew build). - Tests the code (e.g., runs
npm testor./gradlew test). - Scans the code (e.g., checks for security vulnerabilities).
If *any* of these steps fail, the developer gets an instant email. The code is blocked from merging. This provides a **fast feedback loop** and stops bugs *before* they get into the main codebase.
What is Continuous Delivery vs. Deployment (CD)?
This is the next step. After CI (Build + Test) is successful, what happens?
- Continuous Delivery: The pipeline automatically prepares a "release artifact" (e.g., a Docker image) and deploys it to a **staging** (test) environment. It is now 100% ready to go to production. A human (like a QA manager) then gives the *final manual approval* to "promote" this build to production.
- Continuous Deployment: This is the most advanced step. It's the same as above, but there is **no human approval**. If the CI tests pass, the code is *automatically* deployed to *all* users in production, immediately. This is what companies like Netflix and Amazon do.
Your goal is to build a **CI/CD Pipeline**: an automated series of steps (a "workflow") that takes code from a developer's git push all the way to a production server.
Core Pipeline Concepts
- Pipeline/Workflow: The entire automated process from start to finish. Defined in a file (e.g.,
.github/workflows/main.yml). - Stage: A logical section of the pipeline (e.g., "Build Stage", "Test Stage", "Deploy Stage").
- Job: A set of steps that run together (e.g., "build_docker_image", "run_unit_tests").
- Step/Task: A single command (e.g.,
run: npm install). - Runner/Agent: The actual server (virtual machine) that runs your pipeline. This can be hosted by GitHub, or it can be your own self-hosted server.
- Artifact: The *output* of a stage (e.g., the compiled
.jarfile, the.aab, or the Docker image). This artifact is passed to the next stage.
We will learn the two most popular tools for building CI/CD pipelines: **GitHub Actions** (the modern standard) and **Jenkins** (the classic standard).
Part 1: GitHub Actions (The Modern Standard)
**GitHub Actions** is a CI/CD platform built directly into GitHub. It's incredibly powerful, easy to learn, and has a massive community marketplace. It is now the default choice for new projects, especially open-source ones.
How it Works
You create a **YAML** (.yml) configuration file in your repository inside a special folder: .github/workflows/.
When a specific **event** happens (like a push to the main branch), GitHub reads your YAML file and automatically provisions a **Runner** (a fresh virtual machine, usually Ubuntu) to execute the **jobs** and **steps** you defined.
Core Components of a GitHub Actions Workflow
- Workflow: The entire
.ymlfile. - Event (
on:): The *trigger* that starts the workflow (e.g.,on: push,on: pull_request,on: schedule). - Job (
jobs:): A set of steps that run on a fresh runner. You can have multiple jobs that run in parallel (liketestandbuild). - Runner (
runs-on:): The OS to use for the virtual machine (e.g.,ubuntu-latest,windows-latest,macos-latest). - Step (
steps:): A single task. A step can eitherrun:a shell command (likenpm install) oruses:a pre-built "Action" from the marketplace. - Action (
uses:): A reusable, pre-packaged script (e.g.,actions/checkout@v3to check out your code, ordocker/build-push-action@v5to build a Docker image).
Workflow Example 1: Basic Node.js Build & Test
Let's build a CI pipeline for a simple Node.js project. This workflow will:
- Trigger on every
pushto themainbranch. - Spin up an Ubuntu VM.
- Check out our Git repository.
- Set up the correct version of Node.js (e.g., 18).
- Install our dependencies (
npm install). - Run our tests (
npm test).
# Name of the workflow (shows up in GitHub UI)
name: Node.js CI
# 1. 'on' (The Trigger)
# This workflow runs on every push to the 'main' branch
# and every pull request targeting the 'main' branch
on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]
# 2. 'jobs' (What to do)
jobs:
# We define a single job called 'build-and-test'
build-and-test:
# 3. 'runs-on' (The Environment)
# Use the latest stable Ubuntu VM from GitHub
runs-on: ubuntu-latest
# 4. 'steps' (The commands)
steps:
# Step 1: Check out the code from our repo
# 'uses:' runs a pre-built Action
- name: Checkout code
uses: actions/checkout@v4
# Step 2: Set up the correct Node.js version
- name: Set up Node.js 18
uses: actions/setup-node@v4
with:
node-version: '18'
cache: 'npm' # Cache node_modules for speed
# Step 3: Install dependencies
# 'run:' runs a shell command
- name: Install dependencies
run: npm install
# Step 4: Run the tests
- name: Run tests
run: npm test
That's it! By adding this one file to your repository, you have a complete CI system. If `npm test` fails, the `push` will be marked with a red "X" and you will be notified.
Workflow Example 2: Build & Push a Docker Image
This is a true DevOps task. This pipeline will build a Docker image from a Dockerfile in our repo and push it to the **GitHub Container Registry (GHCR)**.
name: Publish Docker Image
on:
push:
branches: [ "main" ] # Only run when we push to main
jobs:
build-and-push-docker:
runs-on: ubuntu-latest
# We need special permissions to push to GHCR
permissions:
contents: read
packages: write # This is the key permission
steps:
# 1. Check out the code
- name: Checkout code
uses: actions/checkout@v4
# 2. Log in to the GitHub Container Registry (GHCR)
- name: Log in to GHCR
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.repository_owner }}
# This is a special, temporary token. No need to store secrets!
password: ${{ secrets.GITHUB_TOKEN }}
# 3. Build the Docker image and push it
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: . # Use the Dockerfile from the root
push: true # We want to push
# Example tag: ghcr.io/msmaxpro/my-app:latest
tags: ghcr.io/${{ github.repository_owner }}/my-app:latest
Secrets & Environments
What if you need to deploy to an AWS server? You need AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. You **must never** write these in your YAML file.
Instead, you go to your **GitHub Repo > Settings > Secrets and variables > Actions**.
Here you add a "Repository secret" (e.g., AWS_SECRET_KEY). Now, you can securely access it in your workflow using the secrets context:
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_KEY }}
aws-region: us-east-1
Read the Official GitHub Actions Docs →
Part 2: Jenkins (The Classic Standard)
If GitHub Actions is the modern, cloud-based solution, **Jenkins** is the classic, self-hosted, and infinitely customizable standard. It's a free, open-source server that you run *yourself* (e.g., on one of your own Linux servers).
Why is Jenkins still used so much?
- Self-Hosted: Companies can run it on their own private servers (on-premise), which is critical for finance or healthcare industries that can't send their code to the cloud.
- Plugins:** Jenkins has thousands of plugins for *everything*. If a tool exists, there is a Jenkins plugin for it.
- Control:** You have 100% control over the environment, security, and hardware.
The `Jenkinsfile` (Declarative Pipeline)
Like GitHub Actions, Jenkins is also "Pipeline-as-Code." You don't use the web UI to build jobs. Instead, you add a file named Jenkinsfile to your repository's root.
Jenkins pipelines are written in a **Groovy-based DSL (Domain Specific Language)**. There are two types: "Scripted" (old and hard) and "Declarative" (new and recommended).
Declarative Pipeline Syntax
The syntax is very logical and easy to read.
pipeline {
// 1. 'agent' - Where should this run?
// 'any' means run on any available Jenkins agent (server)
agent any
// 2. 'stages' - The main container for all our work
stages {
// 3. 'stage' - A logical step (Build, Test, Deploy)
stage('Build') {
// 4. 'steps' - The commands to run
steps {
echo 'Building the application...'
sh './gradlew build'
}
}
stage('Test') {
steps {
echo 'Running unit tests...'
sh './gradlew test'
}
// 5. 'post' - Runs after the stage completes
post {
always {
// Archive the test results, even if they failed
junit 'app/build/test-results/**/*.xml'
}
}
}
stage('Deploy to Staging') {
steps {
echo 'Deploying to staging server...'
sh './scripts/deploy-staging.sh'
}
}
}
}
GitHub Actions vs. Jenkins
- **GitHub Actions:** Easier to start, fully cloud-based, great community actions. Best for new/open-source projects.
- **Jenkins:** More powerful, more complex, 100% customizable, self-hosted. Better for large enterprises with complex, private infrastructure needs.
Part 3: Advanced Deployment Strategies
Your CD pipeline is built. You're ready to deploy. How do you do it without causing downtime? You **never** just stop the old server and start the new one. Users would see an error page. Instead, you use a strategy.
1. Rolling Deployment
This is the most common default strategy. You have a pool of 5 servers (V1).
- Take Server 1 offline.
- Upgrade it to V2.
- Put it back online.
- Take Server 2 offline.
- Upgrade it to V2... and so on.
- Pros: Simple, slow, and safe. Zero downtime.
- Cons: The rollout is slow. For a brief time, you have *both* V1 and V2 running, which can cause database or compatibility issues.
2. Blue/Green Deployment
This is a very safe and powerful strategy. You have two *identical* production environments, "Blue" and "Green."
- Your users are currently on the **Blue** environment (V1).
- You deploy the new version (V2) to the **Green** environment. The Green environment is "offline" (no users).
- You run all your tests on the Green environment.
- When you are 100% confident it works, you flip a switch at the **Load Balancer** (the network router).
- All new traffic *instantly* goes to the Green (V2) environment. Blue (V1) is now offline.
- Pros: Instant rollout. Instant *rollback* (if V2 has a bug, you just flip the switch back to Blue). No version mismatch.
- Cons: **Expensive.** You have to pay for *double* the server infrastructure at all times.
3. Canary Deployment
This is the most advanced and modern strategy, used by giants like Google and Netflix.
- You have 100 servers running V1.
- You deploy V2 to *one single server* (the "canary").
- You configure your load balancer to send **1%** of your *real user traffic* to this one canary server.
- You wait. You watch your monitoring tools (like Prometheus, which we'll see later). Is the error rate going up? Is CPU usage normal?
- If all is well, you deploy V2 to 10% of servers. Wait. Then 50%. Then 100%.
- Pros: The *safest* way to release. You test new features on real users with minimal risk. You can "roll back" by just shutting down the canary servers.
- Cons: Extremely complex to set up. Requires a very mature monitoring system (Chapter 7) and a smart load balancer.