EFK Stack deployment on Kubernetes

Full setup, including yaml manifest files, for a single node test system, collecting logs from nginx.

Table of Contents:

Intro
Components
Prerequisites
Step 1: Elasticsearch deployment
- Option #1:
- Option #2:
Step 2: Kibana deployment
- Testing Kibana availability
Step 3: Fluentd deployment
- Fluentd configuration
- Tips
Sample container: Nginx
Testing the complete stack

Intro

Logs produced by running containers, written to stdout/stderr are, by default, stored on host machine at /var/log/container.

Logrotation is pretty frequent, therefore either you collect and store them somewhere, or they will be soon gone for the good.

At some point, you might (will) need to analyse logs, so it’s a good idea to set up a framework to do so already from the beginning.

Moreover, rather than scrolling text files, having them available as structured data and on a web UI will help a lot, when it comes to log analysis.

An EFK (Elasticsearch – Fluentd – Kibana) stack allows you to do so. A possible alternative is made of ELK (Elasticsearch – Logstash – Kibana).

This post will guide you through all necessary steps to do so. As sample case, we will be collecting (JSON) logs from an nginx container and they will be available on Kibana.

Components

Fluentd – the log aggregator used to collect container stdout/stderr logs and (optionally) process them before sending them to Elasticsearch
Elasticsearch – provides a scalable, RESTful search and analytics engine for storing Kubernetes logs
Kibana – the visualization layer, allowing you with a user interface to query and visualize logs

Prerequisites

A Kubernetes cluster running on a Linux host VM
kubectl utility, configured to interact with the cluster above

Step 1: Elasticsearch deployment

Create a Service using the following yaml manifest

kind: Service
apiVersion: v1
metadata:
  name: elasticsearch
  namespace: default
  labels:
    app: elasticsearch
spec:
  selector:
    app: elasticsearch
  clusterIP: None
  ports:
    - port: 9200
      name: rest
    - port: 9300
      name: inter-node

Create a persistent volume to be assigned to the elasticsearch pods
- Make sure that .spec.local.path points to a folder existing on host VM local filesystem
- Make sure that .spec.nodeAffinity.required.nodeSelectorTerms.matchExpressions.values matches with kubernetes’ cluster node name

apiVersion: v1
kind: PersistentVolume
metadata:
  name: data
  namespace: default
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 5Gi
  persistentVolumeReclaimPolicy: Retain
  local:
    path: /u01/elastic
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - your_node_name

Create a StatefulSet (sample below runs on a single node configuration)

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: es-cluster
  namespace: default
spec:
  serviceName: elasticsearch
  replicas: 1
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      containers:
      - name: elasticsearch
        image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0
        resources:
            limits:
              cpu: 1000m
            requests:
              cpu: 10m
        ports:
        - containerPort: 9200
          name: rest
          protocol: TCP
        - containerPort: 9300
          name: inter-node
          protocol: TCP
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
        env:
          - name: discovery.type
            value: single-node
          - name: cluster.name
            value: k8s-logs
          - name: node.name
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: discovery.seed_hosts
            value: "es-cluster-0.elasticsearch"
          - name: ES_JAVA_OPTS
            value: "-Xms512m -Xmx512m"
      initContainers:
      - name: fix-permissions
        image: busybox
        command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
        securityContext:
          privileged: true
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
      - name: increase-vm-max-map
        image: busybox
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
        securityContext:
          privileged: true
      - name: increase-fd-ulimit
        image: busybox
        command: ["sh", "-c", "ulimit -n 65536"]
        securityContext:
          privileged: true
  volumeClaimTemplates:
  - metadata:
      name: data
      labels:
        app: elasticsearch
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 5Gi

Now, make sure your Elasticsearch pod is up and running:

[root@mr-k8s-demo1 ~]# kubectl get pods -l app=elasticsearch
NAME           READY   STATUS    RESTARTS        AGE
es-cluster-0   1/1     Running   2 (2d16h ago)   5d13h

Time to run a test call via HTTP. Here we have 2 alternatives.

Option #1:

Forward traffic to port 9200 and test via curl from the Linux host VM:

[root@mr-k8s-demo1 ~]# kubectl port-forward $(kubectl get pods -o=name --selector=app=elasticsearch) 9200:9200

Open a new shell (port-forward will allocate the shell above)

[root@mr-k8s-demo1 ~]# curl http://localhost:9200/_cluster/state?pretty

Option #2:

Expose permanently port 9200 with a Service so that it becomes accessible outside from the cluster as well (using Linux host VM’s real IP address):

[root@mr-k8s-demo1 ~]# kubectl expose service elasticsearch --port=9200 --target-port=9200 --external-ip=external_ip_of_your_Linux_host_VM --name=elasticsearch-external

Open a browser and go to http://external_ip_of_your_Linux_host_VM:9200/_cluster/state?pretty

That’s it, regarding Elasticsearch.

Step 2: Kibana deployment

Deploy service + deployment using the following manifest:

apiVersion: v1
kind: Service
metadata:
  name: kibana
  namespace: default
  labels:
    app: kibana
spec:
  ports:
  - port: 5601
  selector:
    app: kibana
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kibana
  namespace: default
  labels:
    app: kibana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kibana
  template:
    metadata:
      labels:
        app: kibana
    spec:
      containers:
      - name: kibana
        image: docker.elastic.co/kibana/kibana:7.2.0
        resources:
          limits:
            cpu: 1000m
          requests:
            cpu: 10m
        env:
          - name: ELASTICSEARCH_URL
            value: http://elasticsearch:9200
        ports:
        - containerPort: 5601

Testing Kibana availability

Now, similarly to what we have just done with Elasticsearch, we must expose our Service. This time, since you will be frequently accessing the web UI, suggested solution is to create directly a Service to expose port 5601:

[root@mr-k8s-demo1 ~]# kubectl expose service kibana --port=5601 --target-port=5601 --external-ip=external_ip_of_your_Linux_host_VM --name=kibana-external

And point your browser to http://external_ip_of_your_Linux_host_VM:5601

Step 3: Fluentd deployment

Next we will set up set up Fluentd as a DaemonSet. Because it is DaemonSet, a Fluentd logging agent Pod will run on every node in our cluster.

Use the following yaml to create the Fluentd daemonset. It will do the following:

Create a ServiceAccount called fluentd. Fluentd processes will use this service account to access the Kubernetes API.
Create a ClusterRole which will allow get/list/watch access on pods.
Create a ClusterRoleBinding. This will bind ServiceAccount above with the ClusterRole giving permissions to the ServiceAccount

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-config
data:
  fluent.conf: |
    <source>
      @type tail
      read_from_head true
      path /var/log/containers/nginx*.log
      pos_file /var/log/containers/nginx.log.pos
      tag nginx.access
      <parse>
        @type regexp
        expression /(?<docker_ts>[^ ]*) (?<docker_flag>[^ ]*) (?<docker_stdout>[^ ]*) (?<data>.*).*$/
      </parse>
    </source>
    <filter nginx.**>
      @type record_transformer
      <record>
        ${record["data"]}
      </record>
      remove_keys docker_ts,docker_flag,docker_stdout
    </filter>
    <filter nginx.**>
      @type parser
      key_name data
      format json
      reserve_data false
    </filter>
    <match nginx.**>
      @type elasticsearch
      host "#{ENV['FLUENT_ELASTICSEARCH_HOST']}"
      port "#{ENV['FLUENT_ELASTICSEARCH_PORT']}"
      user "#{ENV['FLUENT_ELASTICSEARCH_USER']}"
      password "#{ENV['FLUENT_ELASTICSEARCH_PASSWORD']}"
      index_name fluentd
      type_name fluentd
    </match>
    <match **>
      @type null
    </match>
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluentd
  namespace: default
  labels:
    app: fluentd
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: fluentd
  labels:
    app: fluentd
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - namespaces
  verbs:
  - get
  - list
  - watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: fluentd
roleRef:
  kind: ClusterRole
  name: fluentd
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: fluentd
  namespace: default
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: default
  labels:
    app: fluentd
spec:
  selector:
    matchLabels:
      app: fluentd
  template:
    metadata:
      labels:
        app: fluentd
    spec:
      serviceAccount: fluentd
      serviceAccountName: fluentd
      initContainers:
      - name: config-fluentd
        image: busybox
        imagePullPolicy: IfNotPresent
        command: ["/bin/sh","-c"]
        args:
        - cp /fluentd/etc2/fluent.conf /fluentd/etc/fluent.conf;
        volumeMounts:
        - name: config-path
          mountPath: /fluentd/etc
        - name: config-source
          mountPath: /fluentd/etc2
      containers:
      - name: fluentd
        #image: fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1
        image: fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch
        env:
          - name:  FLUENT_ELASTICSEARCH_HOST
            value: "elasticsearch"
          - name:  FLUENT_ELASTICSEARCH_PORT
            value: "9200"
          - name: FLUENT_ELASTICSEARCH_SCHEME
            value: "http"
          - name: FLUENTD_SYSTEMD_CONF
            value: disable
        resources:
          limits:
            memory: 512Mi
          requests:
            cpu: 25m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: config-path
          mountPath: /fluentd/etc
      terminationGracePeriodSeconds: 30
      volumes:
      - name: config-source
        configMap:
          name: fluentd-config
          items:
            - key: fluent.conf
              path: fluent.conf
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: config-path
        emptyDir: {}

Fluentd configuration

Fluentd mounts locally the folder in which log files containing stdout from all containers are available (/var/lib/docker/containers).

Depending on configuration defined on file /fluentd/etc/fluent.conf such content can then be forwarded to elasticsearch.

Configuration file is defined as a ConfigMap object which is then mounted during container startup.

References to elasticsearch must be passed as environment variables:

        env:
          - name:  FLUENT_ELASTICSEARCH_HOST
            value: "elasticsearch" # Make sure this name can be resolved within the cluster
          - name:  FLUENT_ELASTICSEARCH_PORT
            value: "9200"
          - name: FLUENT_ELASTICSEARCH_SCHEME
            value: "http"

Fluentd can read different sources, parse, filter, change/add/remove content etc. before to forward logs to a destination (e.g. elasticsearch).

Documentation (adapt to your desired version) is available here: https://docs.fluentd.org/v/0.12/

Sample configuration file (reads nginx logs as JSON):

    <source>
      @type tail # reads file polling for new entries
      read_from_head true # starts reading from beginning of the file
      path /var/log/containers/nginx*.log #pathname (can include wildcards) of file to be read
      pos_file /var/log/containers/nginx.log.pos # fluentd will store the last read position on this file
      tag nginx.access # adds a tag to this file, useful to add further steps during file processing
	  <parse>
        @type regexp # parses retrieved lines splitting the content according to the regexp below
        expression /(?<docker_ts>[^ ]*) (?<docker_flag>[^ ]*) (?<docker_stdout>[^ ]*) (?<data>.*).*$/
      </parse>
    </source>
    <filter nginx.**>
      @type record_transformer #transforms the content of entries tagged with nginx.*
      <record>
        ${record["data"]} # defines output: Field with name "data"
      </record>
      remove_keys docker_ts,docker_flag,docker_stdout # suppresses entries from output
    </filter>
    <filter nginx.**>
      @type parser
      key_name data # parses as json the value of field "data"
      format json
      reserve_data false # outputs only the parsed content ("data" root is removed)
    </filter>
    <match nginx.**>
      @type elasticsearch # sends processed entries to elasticsearch
      host "#{ENV['FLUENT_ELASTICSEARCH_HOST']}"
      port "#{ENV['FLUENT_ELASTICSEARCH_PORT']}"
      user "#{ENV['FLUENT_ELASTICSEARCH_USER']}"
      password "#{ENV['FLUENT_ELASTICSEARCH_PASSWORD']}"
      index_name fluentd # defines name of index that will be created on elasticsearch
      type_name fluentd
    </match>
    <match **>
      @type null # all the rest, not tagged as nginx.*, will be trashed
    </match>

Tips

Output can also be redirected to a file, useful for troubleshooting (you can see the outcome of log processing, based on your filters/transformers).

Sample:

    <match **>
      @type file
      path /var/log/fluent/myapp
      utc
      append true
    </match>

Sample container: Nginx

The following yaml manifest deploys an nginx instance with all default settings except for the log format: We will be using JSON.

apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-conf
data:
  nginx.conf: |
    user nginx;
    worker_processes  1;
    events {
      worker_connections  10240;
    }
    http {
      log_format logger-json escape=json '{"time_local":"$time_iso8601", "remote_addr":"$remote_addr", "remote_user":"$remote_user", "request":"$request", "status":"$status", "body_bytes_sent":"$body_bytes_sent", "request_time":"$request_time", "http_referrer":"$http_referer", "http_user_agent":"$http_user_agent", "request_length":"$request_length" }';
      server {
          listen       80;
          server_name  localhost;
          location / {
            root   /usr/share/nginx/html;
            index  index.html index.htm;
        }
        access_log /var/log/nginx/access.log logger-json;
      }
    }

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
        volumeMounts:
            - name: nginx-conf
              mountPath: /etc/nginx/nginx.conf
              subPath: nginx.conf
              readOnly: true
      volumes:
      - name: nginx-conf
        configMap:
          name: nginx-conf
          items:
            - key: nginx.conf
              path: nginx.conf

---
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  type: NodePort
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
    nodePort: 30008
  selector:
    app: nginx

Call your nginx just deployed pointing your browser to: http://external_ip_of_your_Linux_host_VM:30008

You should see Nginx’s home page:

Testing the complete stack

Based on the configuration provided above, nginx will log to stdout using JSON format.

Fluentd is listening on such log file, and each new line will be parsed as per configuration file and all entries matching the filter provided will be forwarded to Elasticsearch.

To make them visible, there is still a last step to complete on Kibana.

Based on the configurations provided, you should see at least 1 index named “fluentd”. Note that the name is part of Fluentd configuration file.

To make the index visible, you need to define an Index Pattern.

Click on Management -> Index Patterns -> Create Index Pattern:

Start typing the name so that it matches at least one of existing indexes (on sample above: fluent …). Then, click next and complete index pattern creation.

Now, click on Discover, make sure that the index pattern created above is selected, select a time range that includes the moment you accessed Nginx home page. Collected logs data will be displayed and, on the left column, each single log attribute (obtained by parsing the JSON entries) will be available.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.