Chapter 7.2: Centralized Logging (ELK Stack)
The "Why": Metrics vs. Logs
In the last chapter, we mastered **Metrics** with Prometheus. Metrics tell you **"WHAT"** is happening. They are numbers, aggregated over time. For example:
http_requests_total{status="500"}= 150cpu_usage_percent= 98%
Metrics are great for alerts ("My CPU is at 98%!") and dashboards. But they can't tell you *why* the CPU is at 98%. For that, you need **Logs**.
**Logs** tell you **"WHY"** something happened. They are detailed, timestamped, text-based records of specific events.
[2025-11-10T10:30:01] ERROR: User '123' failed to log in: password incorrect.
[2025-11-10T10:30:05] ERROR: (PID 4567) Unhandled Exception: 'NoneType' object has no attribute 'user' in app/auth.py:52
The Problem: The "500 Container" Problem
In a modern Kubernetes cluster, you have 500 containers running your app. A user reports an error. Which container has the log? You can't kubectl logs 500 different pods. And what if the pod crashed? Its logs are *gone forever*.
The solution is **Centralized Logging**. You need a system that automatically:
- **Ships (Collects):** All logs from all 500 containers.
- **Parses:** Converts unstructured text logs into structured JSON (e.g., separating the timestamp, log level, and message).
- **Stores (Indexes):** Saves all these logs in one central, searchable database.
- **Visualizes (Queries):** Gives you a beautiful UI (like Kibana) to search and filter all your logs (e.g., "Show me all `ERROR` logs from `payment-service` in the last 15 minutes").
The Solution: The ELK Stack
The most popular open-source solution for this is the **ELK Stack** (or "Elastic Stack").
- E - Elasticsearch:** The "Store." This is the highly scalable, distributed search engine that *stores* and *indexes* all your logs.
- L - Logstash:** The "Pipeline." This is the server-side tool that *collects*, *parses* (using Grok), and *transforms* your logs before sending them to Elasticsearch.
- K - Kibana:** The "Dashboard." This is the web UI (like Grafana) that lets you *visualize*, *search*, and *analyze* the logs stored in Elasticsearch.
(Note: A fourth component, "Beats," is often used. "Filebeat" is a lightweight *agent* you install on your servers to "ship" logs *to* Logstash. For our purposes, we'll group it with Logstash).
Part 1: Elasticsearch (The "Database")
Elasticsearch is the heart of the stack. It's built on top of **Apache Lucene** and is, at its core, a **full-text search engine**, not a traditional database. This makes it incredibly fast at searching through terabytes of unstructured text data (like logs).
Core Concepts (SQL vs. Elasticsearch)
To understand Elasticsearch, it's helpful to compare it to a SQL database:
- SQL Database -> **Elasticsearch Index** (e.g.,
logs-2025-11-10) - SQL Table -> **(No direct equivalent)**
- SQL Row -> **Elasticsearch Document** (a JSON object)
- SQL Column -> **Elasticsearch Field** (a key in the JSON object)
- SQL Schema -> **Elasticsearch Mapping**
How it Works: The Inverted Index
Why is Elasticsearch so fast? It doesn't scan every log line. It uses an **Inverted Index**, just like the index at the back of a textbook.
When you save a log "user 'admin' failed", Elasticsearch *doesn't* just store the string. It breaks it down and updates its index:
user-> (in document 1, 5, 7)admin-> (in document 1, 44)failed-> (in document 1, 2, 99)
When you *search* for "admin" AND "failed", it just finds the intersection of those two lists (document 1). This is lightning fast.
Running Elasticsearch with Docker
Elasticsearch is complex to install, but trivial with Docker.
# Run a single-node, development-mode cluster
docker run -d \
-p 9200:9200 \
-p 9300:9300 \
-e "discovery.type=single-node" \
--name elasticsearch \
docker.elastic.co/elasticsearch/elasticsearch:8.10.4
Querying Elasticsearch (with `curl`)
Elasticsearch has a powerful REST API that you can query with `curl`. You don't use SQL; you use **Query DSL** (a JSON-based query language).
# Check the cluster health (wait for status to be 'green' or 'yellow')
$ curl -X GET "localhost:9200/_cluster/health?pretty"
# Search for *all* logs in *all* indexes
$ curl -X GET "localhost:9200/_search?pretty"
# Use Query DSL: Find all logs where 'level' is 'ERROR'
$ curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"level": "ERROR"
}
}
}
'
You will almost never do this by hand. You will use **Kibana** to do this for you.
Read More about Elasticsearch →Part 2: Logstash (The "Pipeline")
Logstash is the "L" in ELK. It's a server-side data processing pipeline. Its job is to pull data from many sources, transform it, and send it to a "stash" (like Elasticsearch).
The 3 Stages of a Logstash Pipeline
You configure Logstash with a simple config file (e.g., logstash.conf) that has three sections:
input:** Where are the logs coming from? (e.g., a file, a port, another service).filter:** How do we **parse** and **transform** the logs? (This is the most important part).output:** Where do the processed logs go? (e.g., Elasticsearch, or just the console for debugging).
Example: Parsing a Simple Nginx Log
Let's say our Nginx log file (/var/log/nginx.log) looks like this:
127.0.0.1 - - [10/Nov/2025:13:55:01 +0000] "GET /api/users HTTP/1.1" 200 456
This is "unstructured" text. We want to parse it into "structured" JSON, like this:
{
"client_ip": "127.0.0.1",
"http_verb": "GET",
"http_path": "/api/users",
"http_status": 200,
"bytes_sent": 456
}
Logstash Configuration (`logstash.conf`)
We use the **grok** filter to do this. `grok` is a powerful plugin that uses pre-built regex patterns to parse unstructured data.
input {
# We will read from a file
file {
path => "/var/log/nginx.log"
start_position => "beginning"
}
}
filter {
# This is the 'grok' filter, the heart of Logstash
grok {
# 'match' defines the pattern to use
# '%{...}' is a pre-defined grok pattern
match => { "message" => "%{IPORHOST:client_ip} - - \[%{HTTPDATE:timestamp}\] \"%{WORD:http_verb} %{URIPATHPARAM:http_path} HTTP/%{NUMBER:http_version}\" %{INT:http_status} %{INT:bytes_sent}" }
}
# After grok, http_status is a "String" ("200").
# We need to convert it to a number.
mutate {
convert => {
"http_status" => "integer"
"bytes_sent" => "integer"
}
}
# Parse the timestamp so Elasticsearch knows it's a date
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
# 1. Send the data to our Elasticsearch container
elasticsearch {
hosts => ["http://elasticsearch:9200"]
index => "nginx-logs-%{+YYYY.MM.dd}" # Create a new index every day
}
# 2. Also print it to the console for debugging
stdout {
codec => rubydebug
}
}
Read More about Logstash →
Part 3: Kibana (The "Dashboard")
Kibana is the final piece. It's the web UI that lets you search, visualize, and build dashboards on the data in Elasticsearch.
Key Features of Kibana
- Discover:** The main "search" page. This is where you see the raw logs and can filter them. You can use **KQL (Kibana Query Language)** to search.
http_status: 404(Show me all "Not Found" errors)level: "ERROR" AND client_ip: "123.45.67.89"(Show all errors from this specific user)
- Visualize:** The tool for creating charts. You can create a pie chart of `http_status` codes, or a line chart of `errors_total` over time.
- Dashboard:** A page where you can arrange multiple visualizations to create a single, high-level overview of your application's health.
Part 4: The Modern Stack (Fluentd / Fluent Bit)
Logstash is powerful, but it's a "heavyweight." It's written in Java (runs on the JVM) and can use a lot of memory. This can be too much for a simple server or container.
The modern, cloud-native alternative is the **EFK Stack**:
**E**lasticsearch + **F**luentd/Fluent Bit + **K**ibana
What is Fluentd / Fluent Bit?
Fluentd** (written in Ruby) and **Fluent Bit** (written in C) are lightweight, high-performance log *collectors* and *shippers*. They are designed to run *everywhere*.
Why use them?
- Lightweight:** Fluent Bit uses *very* little memory and CPU, making it perfect for running as a "sidecar" in Kubernetes.
- Cloud-Native:** It was built to understand containers and Kubernetes. It can automatically grab all logs from all Pods on a server and enrich them with metadata (like the Pod name, namespace, and labels).
- Pluggable:** It has hundreds of plugins to send logs *to* any destination (not just Elasticsearch, but also AWS S3, Prometheus, etc.).
How it works in Kubernetes (The `DaemonSet`)
In a production Kubernetes cluster, you don't run Logstash. Instead, you run **Fluent Bit** as a **DaemonSet**.
A DaemonSet is a Kubernetes object (like a Deployment) that ensures **one and only one** copy of a Pod runs on *every single Node* in the cluster.
This Fluent Bit pod (running as a DaemonSet) mounts the host's log directory (/var/log/containers) and "tails" all the log files from all the other containers on that server, shipping them off to your central Elasticsearch cluster.
[SERVICE]
Log_Level info
Daemon Off
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker # Use the built-in docker log parser
[FILTER]
Name kubernetes
Match *
Kube_URL https://kubernetes.default.svc:443
Merge_Log On # Merge the JSON log string into the main object
[OUTPUT]
Name es # (Elasticsearch)
Match *
Host elasticsearch-master
Port 9200
Logstash_Format On
Index k8s-logs-%Y.%m.%d
Read More about Fluent Bit →