Chapter 8: Configuration Management (Ansible)
Part 1: The "Why" - What is Configuration Management?
In Chapter 5 (Cloud Providers), we learned how to use **Terraform (IaC)** to *provision* our infrastructure. We can create 100 AWS EC2 servers with one command. This is great, but what now? All we have is 100 blank, empty Ubuntu servers.
How do we:
- Install
nginxon all 100 servers? - Update the
/etc/nginx/nginx.conffile with our settings? - Add a user
www-data? - Ensure the
nginxservice is running and enabled on boot?
You *could* write a giant Bash script (Chapter 1) to do this. But this is the wrong tool for the job. Why?
Why Bash Scripts Fail (Procedural vs. Declarative)
A Bash script is **Procedural**. It's a list of steps to follow.
$ sudo apt install nginx
What happens if you run this script *twice*? The first time, it works. The second time, it *might* fail, or it might just say "nginx is already installed." What if you run it on a server that has an *older* version of Nginx? Will it upgrade? You don't know.
A Configuration Management tool is **Declarative**. You don't write *steps*. You write a *description of the desired state*.
You write: nginx: state=present, version=latest.
The tool (Ansible) is smart. It logs into the server and checks: "Is nginx installed?"
- If "no," it runs
apt install nginx. - If "yes," and it's the latest version, it does **nothing**.
- If "yes," but it's an old version, it runs
apt upgrade nginx.
This is **Idempotency**. You can run the same command 1,000 times, and the result will always be the same: one server with the latest Nginx installed. This is the core principle of Configuration Management.
Ansible vs. Puppet vs. Chef
These are the three main tools for this job.
- Puppet & Chef (The "Pull" Model): These are "agent-based." You must install a "Puppet Agent" on all 100 of your servers. Each agent then "pulls" its configuration from a central Puppet Master server. This is very powerful but also very complex to set up.
- Ansible (The "Push" Model): This is **agentless**. You *do not* install anything on your 100 servers (except Python, which is usually there). Ansible works by simply **SSHing** (Chapter 2.1) into each server from a central "Control Node" (your laptop) and running the commands for it.
Because it is agentless, simple, and uses easy-to-read YAML, **Ansible** has become the dominant and most popular tool for this job, especially in cloud environments.
Part 2: Ansible Setup & Core Concepts
Ansible is incredibly easy to set up. You only install it on **one machine**: your "Control Node" (your laptop or a dedicated admin server).
Installation (on the Control Node)
Ansible is a Python tool. The best way to install it is with pip inside a virtual environment (as we learned in Chapter 1).
# 1. Create a Python virtual environment
$ python3 -m venv ansible_env
$ source ansible_env/bin/activate
# 2. Install Ansible
(venv) $ pip install ansible
# 3. Verify the installation
(venv) $ ansible --version
ansible [core 2.15.5] ...
The Core Components
- Control Node:** The machine Ansible is installed on (your laptop).
- Managed Nodes:** The servers you want to manage (e.g., your 100 web servers). They only need to have an SSH server and Python installed.
- Inventory:** A "phone book." A simple file (e.g.,
hosts.ini) that lists the IP addresses or domains of all your Managed Nodes. - Playbook:** The "recipe." A YAML file (e.g.,
setup_nginx.yml) that lists all the tasks you want to perform. - Module:** A "tool" that Ansible uses to perform a task. Ansible has thousands of modules (e.g.,
aptmodule,docker_containermodule,copymodule).
Step 1: SSH Key Authentication (MANDATORY)
Ansible works by SSHing into your servers. It *cannot* use password authentication. You **must** have passwordless SSH login set up from your Control Node to all your Managed Nodes.
As we learned in Chapter 2.1:
# 1. On your laptop (Control Node), generate a key if you don't have one
$ ssh-keygen -t rsa -b 4096
# 2. Copy your *public* key to EACH of your servers
$ ssh-copy-id user@server1.example.com
$ ssh-copy-id user@server2.example.com
Step 2: The Inventory File
Create a file named inventory.ini. This is where you list your servers. You can group them together.
# A group for all web servers
[webservers]
web1.example.com
web2.example.com
192.168.1.50
# A group for all database servers
[dbservers]
db1.example.com
Step 3: The `ansible.cfg` File (Optional but Recommended)
Create a file named ansible.cfg in the same directory. This tells Ansible where to find your inventory and what user to log in as.
[defaults]
# Use our inventory file
inventory = inventory.ini
# The default user to SSH as
remote_user = msmaxpro
# Don't bother checking host keys (less secure, but easier for testing)
host_key_checking = False
Part 3: Ad-Hoc Commands (Your First Contact)
Before you write a complex Playbook, you can run simple, one-off commands to test your connection. This is done with the ansible command.
Checking the Connection (`ping` module)
The -m flag specifies which **module** to use. The ping module doesn't ping the server; it logs in via SSH, checks if Python is installed, and returns "pong" if successful.
# Run the 'ping' module on ALL hosts in the inventory
$ ansible all -m ping
web1.example.com | SUCCESS => {
"changed": false,
"ping": "pong"
}
web2.example.com | SUCCESS => {
"changed": false,
"ping": "pong"
}
db1.example.com | FAILED! => {
"msg": "Failed to connect to the host via ssh: ssh: connect to host db1.example.com port 22: Connection timed out"
}
This is amazing! In one command, you just diagnosed your entire infrastructure. You know your `webservers` are reachable, but `db1.example.com` is offline or its firewall is blocking port 22.
Running Raw Shell Commands (`shell` module)
You can use the -a flag to pass arguments to a module. The shell module lets you run any Bash command.
# Get the uptime for all servers in the 'webservers' group
$ ansible webservers -m shell -a "uptime"
web1.example.com | SUCCESS | rc=0 >>
10:30:01 up 5 days, 20 min, 1 user, load average: 0.00, 0.01, 0.05
web2.example.com | SUCCESS | rc=0 >>
10:30:02 up 10 days, 4 hr, 1 user, load average: 0.00, 0.00, 0.00
# Get free memory on ALL servers
$ ansible all -m shell -a "free -h"
Part 4: Playbooks (The "Recipe")
Ad-hoc commands are good for checking status, but not for real automation. For that, you use a **Playbook**. A Playbook is a YAML file that lists a series of **Tasks** to be run on a group of hosts.
This is where you write your *declarative state*.
Example: Our First Playbook (Installing Nginx)
Create a file named install_nginx.yml.
# A 'play' is a set of tasks for a group of hosts
---
- name: Install and configure Nginx
hosts: webservers # Run this play on the [webservers] group
become: true # This means "run tasks as root" (using sudo)
# 'tasks' is the list of actions to perform
tasks:
- name: 1. Ensure Nginx is installed at the latest version
apt:
name: nginx
state: latest
update_cache: yes # Run 'apt update' first
- name: 2. Ensure Nginx service is running and enabled on boot
service:
name: nginx
state: started
enabled: true
Running the Playbook
You run playbooks with the ansible-playbook command.
$ ansible-playbook install_nginx.yml
PLAY [Install and configure Nginx] *************************************
TASK [Gathering Facts] *************************************************
ok: [web1.example.com]
ok: [web2.example.com]
TASK [1. Ensure Nginx is installed...] *********************************
changed: [web1.example.com] # This server was changed
ok: [web2.example.com] # This server was already OK
TASK [2. Ensure Nginx service is running...] ***************************
changed: [web1.example.com]
ok: [web2.example.com]
PLAY RECAP *************************************************************
web1.example.com : ok=3 changed=2 unreachable=0 failed=0
web2.example.com : ok=3 changed=0 unreachable=0 failed=0
This is **idempotency** in action. On `web1`, it installed Nginx (`changed=2`). On `web2` (which already had Nginx), it did *nothing* (`changed=0`). This is safe to run 1,000 times.
Part 5: Essential Modules Deep Dive
Ansible's power comes from its modules. You should *always* use a module (like apt) instead of the shell command (shell: apt install...). The module is idempotent; the shell command is not.
apt / yum / dnf
Manages system packages. Ansible is smart enough to detect the OS and use the right one.
- name: Install a list of packages
ansible.builtin.apt:
name:
- nginx
- git
- unzip
state: present
service / systemd
Manages services (as we saw in systemctl).
- name: Restart the Docker service
ansible.builtin.service:
name: docker
state: restarted
copy
Copies a file from your Control Node (laptop) to the Managed Nodes.
- name: Copy my custom config file
ansible.builtin.copy:
src: files/my-nginx.conf # (Path on your laptop)
dest: /etc/nginx/nginx.conf # (Path on the server)
owner: root
group: root
mode: '0644' # (rw-r--r--)
template (and Jinja2)
This is the *most powerful* file module. It's like copy, but it processes the file as a **Jinja2 template**. This lets you use variables inside your config files.
# This is a Jinja2 template.
worker_processes {{ ansible_processor_vcpus }};
http {
server {
listen {{ my_nginx_port }};
server_name {{ my_domain_name }};
}
}
install_nginx.yml (Playbook)
- name: Configure Nginx from template
hosts: webservers
become: true
# We can define variables for this play
vars:
my_nginx_port: 80
my_domain_name: "codewithmsmaxpro.me"
tasks:
- name: Push the templated config file
ansible.builtin.template:
src: templates/nginx.conf.j2
dest: /etc/nginx/nginx.conf
Ansible will read nginx.conf.j2, replace the {{ ... }} variables, and create a custom nginx.conf file on each server. {{ ansible_processor_vcpus }} is a built-in "fact" (data Ansible finds) about the server.
lineinfile
Used to ensure a specific line exists (or doesn't) in a file. This is perfect for changing a single setting without replacing the whole file.
- name: Disable password authentication in SSH
ansible.builtin.lineinfile:
path: /etc/ssh/sshd_config
regexp: '^#?PasswordAuthentication'
line: 'PasswordAuthentication no'
state: present
Part 6: Advanced Playbook Logic
Handlers (Restarting Services)
In our example, we changed nginx.conf, but Nginx won't see the change until it's restarted. You *could* add a service: restart task, but that's inefficient. It would restart Nginx *every* time you run the playbook, even if the config didn't change.
A **Handler** is a task that *only* runs if it is "notified" by another task that *actually made a change*.
install_nginx.yml (Updated)- name: Install and configure Nginx
hosts: webservers
become: true
tasks:
- name: Ensure Nginx is installed
ansible.builtin.apt:
name: nginx
state: latest
- name: Push the templated config file
ansible.builtin.template:
src: templates/nginx.conf.j2
dest: /etc/nginx/nginx.conf
notify: Restart Nginx # Notify the handler
- name: Ensure Nginx is running
ansible.builtin.service:
name: nginx
state: started
# This block runs at the *end* of the play,
# *only if* it was notified by a change.
handlers:
- name: Restart Nginx
ansible.builtin.service:
name: nginx
state: restarted
Conditionals (when:)
The when: clause lets you skip tasks based on a condition. Ansible automatically gathers "facts" about your servers (like OS, IP, etc.).
- name: Install Apache (for CentOS)
ansible.builtin.dnf:
name: httpd
state: present
when: ansible_facts['os_family'] == "RedHat"
- name: Install Apache (for Ubuntu)
ansible.builtin.apt:
name: apache2
state: present
when: ansible_facts['os_family'] == "Debian"
Loops (loop:)
You can easily loop over a list of items.
- name: Install a list of common packages
ansible.builtin.apt:
name: "{{ item }}" # 'item' is the variable
state: present
loop:
- git
- curl
- unzip
- htop
Part 7: Roles & Vault (Professional Structure)
Ansible Roles (Reusable Playbooks)
Your install_nginx.yml file will get very long. What if you want to re-use your "install nginx" logic in 5 different projects? You can't copy-paste.
A **Role** is a way to package all the tasks, templates, variables, and handlers for *one specific job* (like "nginx" or "mysql") into a standard, reusable folder structure.
This is the standard directory structure for an Ansible project:
ansible-project/
├── inventory.ini # Your list of servers
├── ansible.cfg # Your config
├── site.yml # The MAIN playbook
└── roles/
└── webserver/ # The "webserver" role
├── tasks/
│ └── main.yml # All tasks go here
├── templates/
│ └── nginx.conf.j2 # All templates go here
├── handlers/
│ └── main.yml # All handlers go here
└── vars/
└── main.yml # Default variables for this role
Then, your main site.yml playbook becomes incredibly simple:
---
- name: Configure all web servers
hosts: webservers
become: true
roles:
- webserver # Just apply the 'webserver' role
# - database
# - monitoring
Ansible Vault (Managing Secrets)
How do you store your DATABASE_PASSWORD? You can't write it in plain text in vars/main.yml and push it to GitHub.
**Ansible Vault** is a built-in tool that lets you encrypt specific files or variables.
# 1. Create a new, encrypted file for your secrets
$ ansible-vault create roles/webserver/vars/secrets.yml
New Vault password: ********* (type your password)
(A text editor opens. You add your secrets:)
db_password: "my-super-secret-password-123"
The file secrets.yml will just contain encrypted nonsense. It's safe to commit to Git.
# 2. Run your playbook, asking for the password
$ ansible-playbook site.yml --ask-vault-pass
Vault password: ********* (type your password)
Ansible will automatically decrypt the file *in memory* during the run and inject db_password as a variable. Your secret is never exposed.