Kubernetes Monitoring and Logging: Tools and Best Practices

Last Updated : 19 Jan, 2024

Kubernetes (K8s) is an open-source project under the CNCF organization that mainly helps in container orchestration by simplifying the deployment and management of containerized applications. It is widely used in DevOps and cloud-native space, and one cannot imagine DevOps workflow without it. During the management of these containerized applications, it becomes harder and harder to monitor these containers due to the increasing complexity and scalability of these containers. Hence, a proper monitoring and logging setup is essential to make sure things don’t break unexpectedly. In a one-liner, Monitoring or Observability is the process of watching out for the application through alerts. Logging or Logs are info of every small thing happening inside the containers (e.g. ‘namespace created’ –> ‘pod is yet to start’ –> ‘pod is running’ –> ‘pod is restarting’ etc.)

kubernetes workflow

What is Kubernetes?

Kubernetes is an open-source Container Orchestrator tool that helps in managing microservices and it provides several functionalities and features around that with some monitoring features too. Kubernetes itself is a huge and complex project under CNCF (Cloud Native Computing Foundation).

What is Kubernetes Monitoring And Why Should You Care About It?

Kubernetes monitoring or simply monitoring is a set of practices used to make sure that our Kubernetes cluster is working properly, and in-case any unusual thing happens with our cluster for example – some pods are crashing again and again, some pods are not starting, authentication errors etc. Then through some set of practices and methods we identify the cause of the issue and then troubleshoot it. For this purpose we monitor some thing called as ‘Metrics’. Metrics are basically the parameters that we monitor for our monitoring purpose. Monitoring in terms of cloud-native world is also known as ‘Observability’.

What Metrics To Monitor For Monitoring?

There are ‘n’ no. of parameters that you can assess for monitoring but it will not be feasible. Below are listed some of the most important metrics that you must have to monitor and they cover most part of your application. You can also assess some additional metrics as per use-case.

Node resource usage
How many pods are running in a node
Deployments and Daemonsets
Pods (Which are failing, restarting, in CrashLoopBackOff)
Memory utilisation by pods and cluster
Application health and performance

What Options Are Available For Monitoring Kubernetes Cluster?

Kubernetes Dashboard: Kubernetes itself provides a dashboard which you can access via a web-browser, it roughly covers main metrics and gives a glimpse of what is happening in cluster.
Prometheus: It is one of the most famous monitoring tool in the entire market. It provides powerful metrics with lots of functionalities and cluster integration for Kubernetes specially.
Grafana: Grafana dashboard is popular for its visual UI dashboard that makes it very interesting to measure and keep track of different clusters and metrics. It is oftenly used with Prometheus for creating a powerful monitoring setup.
EFK Stack: It provides a centralized way to collect logs, and then those logs are depicted in a dashboard for visual representation. Here EFK stands for ‘ElasticSearch‘, ‘Fluentd’, ‘Kibana’ which are basically respective tools for collecting logs, integration to dashboard, and finally the dashboard.
Cloud-based monitoring: Many cloud providers provide there own services for allowing users of there cloud to not look anywhere for different toolings and instead they provide all monitoring setup as a single setup which is specific to use through their cloud.

Features of monitoring and logging

A good Logging & Monitoring setup ensures the reliable use of application while taking care of security of the cluster/application. Some of the key points that describe this importance is as below:-

Logging

Debugging: They help in troubleshooting related issues as they provide detailed step by step info of every change.
Auditing and Compliance: it helps in storing a track record of all the activities.
Analysis: They help in analyzing resource utilization and optimization.
Security: It helps in identifying threats and security bugs.

Monitoring

Bug identification: It helps in identifying any abnormalities caused to cluster.
Efficient resource allocation: It helps in efficient use of resource utilization by proper resource allocation as per the requirement of pods, hence saves resources and resolves scalability issues.
Trust factor: It increases trust and customer service support of user as it helps in identifying the issues in real time.
Advantageous functionalities: A human might miss an incident, but machine won’t, and hence functionalities like automated alerting system is also provided with monitoring tools.
Reliability: A good setup of logging and monitoring system increases reliablity of overall application.
Insights: It helps in giving some forecast of performance and health of components and cluster too.

Monitoring & Logging hands-on

Real time cluster monitoring using K8s dashboard & native logs

Step 1: Install Kubernetes dashboard locally:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml

Step 2: Enable access to dashboard using ‘proxy’ cmd, and you will be able to access it on Port 8001 or you may require doing port forwarding for it:

kubectl proxy

Step 3: You can also access this UI dashboard more securely with a service a/c and cluster binding file. For that, first you have to make a service a/c (A manifest file through which you can define different permission, roles and access control by also making sure that only a handful of authenticated user can access and edit this file, K8s uses RBAC for this purpose). If you aren’t using an old k8s version that is about to deprecate then the output will be as below or if you want to create own serviceAccount then you can use ‘touch’ and ‘vi’ cmds for that.

kubectl get serviceaccounts

NAME      SECRETS    AGE
default      1       4d

touch geeksforgeeks.yaml

vi geeksforgeeks.yaml

# Write contents of your manifest as something like below after pressing "i" and then save it 
# using (ESC + : + x) 

apiVersion: v1
kind: ServiceAccount
metadata:
  name: geeksforgeeks
  namespace: default

kubectl apply -f geeksforgeeks.yaml

# You can check if your service a/c is configured properly or not, using below cmd:

kubectl get serviceaccounts/geeksforgeeks -o yaml

Step 4: Making some roles and permissions using clusterRole & clusterRoleBinding (A manifest file using which we appoint permissions to specific roles and groups). These steps as a part of best practices are for application specific service a/c, additionaly you can apply these changes to other namespaces as well. In the below examples, we will be using “default” namespace with “geeksforgeeks” as our service account.

# It will grant 'read-only' permission to our service a/c, which means we will be able to only view 
# dashboard using our service a/c. You can change these permissions for different service a/c's.

kubectl create rolebinding geeksforgeeks-view \
  --clusterrole=view \
  --serviceaccount=default:geeksforgeeks \
  --namespace=default

Step 5: We will require a token for accessing our K8s dashboard. Create token, and paste it to your browser to access your cluster using below cmds (It is done for security purposes). You will get a long random code, copy that token and paste it to your dashboard, and you will be able to log in to your dashboard.

kubectl create token geeksforgeeks

Step 6: Make a deployment manifest file of your application. Ex-

apiVersion: apps/v1
   kind: Deployment
   metadata:
     name: geeksforgeeks
   spec:
     replicas: 1
     selector:
       matchLabels:
         app: geeksforgeeks
     template:
       metadata:
         labels:
           app: geeksforgeeks
       spec:
         containers:
         - name: geeksforgeeks
           image: your-app-image
           command: ["your-app-command"]

Step 7: Access Container logs using ‘logs’ cmd

kubectl logs
or
kubectl logs <pod-name>

Step 8: After opening dashboard you will see a UI similar to this one with your pods and cluster info. Now you can explore these monitoring features under different tabs such as – Overview, Nodes, Workloads, Storage, Configuration, CRD’s, Metrics, Events etc. You will see info of your replicas, pods, deployments etc in a pie-chart format at home and you can even monitor different namespaces seamlessly. You can now easily use these features and monitor your cluster using it.

Kubernetes Dashboard

Step 9: You can seamlessly create and update pod, replicas, deployment etc. manifest file without using kubectl cmds with just few clicks on this dashboard. For doing so click on “+” icon at top-right corner and a pop-up screen will appear, where you can select different info for the manifest file that you want to create, and your manifest file will be created live. And for updating just select the manifest file, click on inspect option and you will be able to update its content.

Step 10: For viewing live logs of a manifest file, simply click on 3 dots icon at top right after clicking on that manifest file and then click on logs option to view or download those logs.

Best Practices

Using advanced monitoring solution such as (Prometheus + Grafana) or (Telegraf + InfluxDB + Grafana) or etc. for advanced monitoring functionalities and capabilities with metrices.
Using a centralized log system such as EFK stack (ElasticSearch + Fluentd + Kibana) for log collection and management.
Using a cloud-based or cloud-native based tooling for monitoring and logging.
Enabling audit logging in K8s for having a record of all the requests made via API server for enhanced security.
Monitoring and reducing costs associated with clusters, by keeping a check on resource utilization of cloud services by our cluster and opting for only required resources. Hence, reducing the cost of cluster.
Monitor only relevant metrices, and don’t make it complicating by monitoring 10’s of metrics for each cluster.
Set-up and integrate an alert system with Prometheus to get notified about your cluster health in a centralized way via Slack, E-mail and etc.
Write some automation scripts and integrate them with your monitoring setup to troubleshoot common issues via an automation system.
Always be prepared with a disaster recovery setup of your application in-case of some unusual mishappening with your application or system.

Conclusion

Monitoring and logging are crucial things for troubleshooting a cluster. Monitoring or Observability is basically the process of watching out for current and changing state of containers and components in the cluster and making us aware of state of application through alerts. Logging or Logs are basically info of every small thing happening inside the containers (e.g. ‘namespace created’ –> ‘pod is yet to start’ –> ‘pod is running’ –> ‘pod is restarting’ etc.). We have different-different monitoring and logging solutions for different requirements and use cases. The set up depends upon use case and the functionalities you require. Documentation and their slack community can be referred in case you need any further help.

Kubernetes Monitoring and Logging – FAQs

Should I use (K8s native monitoring setup) or (Prometheus & Grafana) or (Cloud services)?

It depends upon the use case, if you are a beginner and new to these devops stuff or you don’t require much functionalities then a simple native k8s dashboard will be simple to use tool for getting the task done. If you are working in a complex project, or you need advanced functionalities, or a proper setup for making it easier to troubleshoot then using Prometheus & Grafana could be the best option to go with. And if you are already familiar & comfortable with a cloud then considering cost of cloud, using Prometheus, Grafana or cloud services would be better option to go with.

Which resources should I follow for further learning?

Refer Official documentation of Kubernetes, Prometheus and other devops tooling for being up to date with common issues, official blogs and new functionalities.

What to do if I get stuck with monitoring?

Slack community of these devops tools are very active and welcoming. If you get stuck somewhere and you are not able to make any progress, then reach out to them via their slack community, and make sure you ask your questions properly there.

What is Observability/Monitoring?

Monitoring or Observability is basically the process of watching out for current and changing state of containers and components in the cluster and making us aware of state of application through alerts.

What are Logs?

Logging or Logs are basically info of every small thing happening inside the containers (e.g. ‘namespace created’ –> ‘pod is yet to start’ –> ‘pod is running’ –> ‘pod is restarting’ etc.

Should I learn cloud specific tooling or cloud-native based tooling?

Cloud based or cloud-native based tooling are getting more and more adopted by companiesnowadays. So, learning devops tooling that is cloud-native based could be the best option to learn as they are vendor free and reliable and hence, as such no company have any problem to use them over cloud-based tooling untill and unless most of their code is in a specific cloud or they have their own cloud platform.

What are some of the most recommended tools to learn in 2024 for Monitoring?

Some of the most recommended tooling to learn are as below –

Prometheus and Grafana (Highly recommended)

EFK Stack (Optional)

Cloud specific tools of atleast one Tier1 clouds {AWS, Azure, GCP} (Highly recommended)

Cloud specific tools of atleast one Tier2 clouds {DO, Heroku, Civo} {Highly recommended)

DataDog/Sysdig for monitoring and logging (Optional)

Jeger, Loki, Thanos or Cortex with Prometheus and Grafana (Optional but recommended)

Suggest improvement

Monitoring And Logging For Amazon ECS Services

Share your thoughts in the comments