0% found this document useful (0 votes)
63 views

Monitoring

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Monitoring

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 43

DevOps Classroomnotes

02/May/2023
Hospital Management System – Needs W.r.t Application
Stability
 The architecture of management System is as follows

 Your organization qtinfosystems is maintaing this system for 100


hospitals
 Now lets try to figure out some failures
 Network failures
 Hardware failures
 Application failures
 You are assigned to figure out failures. To solve this issues we
will figure out a pro-active approach
 For every 1 minute
 check if every server is responding or not
 Check if application is responding or not
 Alert if the servers/applications are not responding
 Log is a record which specifies some activity done
 Operating systems have logs, we might need to finetune
it
 Windows => event viewer
 Linux => Syslog
 Applications also log, try to understand about failure
from there
 Tracing is an approach to figure out the flow in your system
 Every system has resource utilization information
 cpu
 memory
 disk
 network
 Metrics are values which represent some information about
system/application with value as number with time dimension
 QT Info System needs a Monitoring Solution.
 Observability is what QT Info System needs i.e. they need to
get
 logs
 metrics
 traces
 MTTR (Mean Time To Recover) this refers to average time
taken by your organization to recover from failures.
 MTTF (Mean Time To Fail): This refers to average time
during the certain to get an failure in your system
 SLA (Service Level Agreement): This is an agreement
between service provider and customer w.r.t to availability
and other important metrics

DevOps Classroomnotes
03/May/2023
What has to be monitored?
 There are organizations and individuals who have published
the best practices on implementing a monitoring solution
 Google Four Golden signals Refer Here
 USE method Refer Here
 RED method Refer Here

Terms in Monitoring
 Latency
 Traffic
 Errors
 Saturation

Some basic Stuff


 Impact of CPU, Memory and DISK on your applicaions

 Webserver: When requests are sent, threads are created which


will have its own cpu and memory share. So as number of
requests increase the load on cpu and memory increases.

 Generally, to figure out the saturation points, organizations


stress/load the systems with the help of performance test
engineers

DevOps Classroomnotes
04/May/2023
Metrics, Logs and Traces
 Refer Here for detailed info on logs vs metrics vs traces
 Metrics: Metrics are numeric time-series data.
 Logs :
 Logs are text informations with no standard way/format.
 Logs from different applications/servers
 Apache 192.168.2.20 - - [28/Jul/2006:10:27:10 -0300] "GET /cgi-
bin/try/ HTTP/1.0" 200 3395
 Refer Here for some other applications
 For logs we deal with text (unstructured data)
 using logs requires a solution to
 convert unstructured text in semi structured
 understand logs with various format
 Log analysis solution
 Traces:
 APM (Application Performance Monitoring) Agents can
help
 We are trying to make our applications observable
 Monitoring tells you when something is wrong, while
observability enables you to understand why.
 Tools

DevOps Classroomnotes
05/May/2023
Pull vs Push Monitoring
 Pull Monitoring: Monitoring System pulls the information
from various servers/applications/network devices the metrics.

 Push Monitoring: Monitoring system get the information from


various servers/applications

 Examples
 Pull:
 Prometheus
 Nagios
 Push:
 Log stash
 splunk

Elastic Stack
 This was called as ELK Stack
 ELK
E = Elastic Search
 L = Log Stash
 K = Kibana
 Architecture

 Elastic Search: This is memory or storage system in the


Elastic Stack
 Logstash: Responsible for making logs queryable
 Beats: Export metrics, logs, traces to Elastic Search
 Kibana: Creates dashboards and visualizations
Google for the following

 What are popular metrics for


 web server (apache)
 database (mysql)
 Web Servers
 Requests per second
 Errors
 Thread count
 Response Time (Average)
 Server:
 CPU Uilization
 Free Memory/Used Memory
 Disk Space
 Disk I/O
 Network
 Incoming
 Outgoing
 Databases:
 Number of Connetions
 Size of Data Processed per second
 Database Size

DevOps Classroomnotes
06/May/2023
Applications we will be observing
 Traditional Applications: These are the applications which run
on physical or virtual machines hosted on-premises or cloud
 Containerized Applications: These will be the applications
running on kubernetes cluster.
 Technology:
 Python
 .net
 C#
 nodejs
 Approach:
 We will be getting info in the following order
 metrics
 logs
 traces
 What is Site Reliability Engineering?

Traditional Applications

 I will be sharing some scripts when necessary


 Lets choose the same applications for both traditional and k8s
Lab Setup

 Cloud Account (AWS/Azure)


 Elastic Cloud (14 day free trail)
 How to create VMs?
Options

 Ecommerce
 shopizer (java)
 nopCommerce (.net)
 Saleor (python)
 Sprut Commerce (nodejs)
 Medical Record System/Hospital managment system
 Open Mrs (java)
 Bahmni (java)
 hospital run (node js)

NOP Commerce
 To install this application we need atleast two servers
 database server: (Linux/Windows)
 mysql
 microsoft sql server
 postgres
 application/web server: (Linux/Windows)
 dotnet core
 nginx
 Our setup:
 2 ubuntu Linux servers
 Metrics:
 Server Metrics
 cpu
 memory
 disk
 network
 Application metrics
 Requests
 Errors
 Response time
 Installation
 Manual
 Automated

DevOps Classroomnotes
07/May/2023
nopCommerce Architecture
 This application has two servers involved
 Application:
 This application runs on .net core 7
 install the application
 If the application is horizontally scaled, then we
will be using a loadbalancer/reverse proxy
 Database
 we will be using mysql database
 This can be a managed database

Realizing this application in AWS

 Let me create a free tier rds based mysql


 Install dotnet 7 on ubuntu vm Refer Here
 Refer Here for installing nopcommerce on linux
 Refer to classroom video for installation
Next Steps

 Lets create a basic check to verify if


 the server is alive.
 the application is alive.
 Email Alerts: Refer Here. Create an inbox in mail trap

DevOps Classroomnotes
09/May/2023
Monitoring and Observability Setup
Labsetup

 We will be using two elastic cloud accounts


 one account for dev/experimentation
 other account for making nopCommerce observable
 We need a mail trap setup for alerts where we will have two
inboxes.
Working with Elastic Stack

 Understanding of YAML Refer Here

Elastic Cloud Account Setup


 Create a free trail account Refer Here
 Lets setup connectors for communications (email/teams/slack)
 For email create account in mailtrap and use the credentials of
mail trap over here
 Enter the details of mail trap in email and run the test (refer
classroom video)
Workflow
 Overview

 We will have a system with heartbeat installed which checks


if the application/server is up or not and reports the status to elastic
cloud (elastic search)
 The uptime in observability section of elastic cloud will show
the status of each service/server from which you can configure
alerts based on connectors

DevOps Classroomnotes
10/May/2023
Uptime Monitoring
 For the overview of the setup

 Lets create a linux machine and


 install heart beat
 configure heart beat to send metrics to elastic cloud
 We will be configuring heart beat to check if the
 apache server is alive
bash
sudo apt update
sudo apt install apache2 -y
 Heart beat installation:
 Refer Here for the overview
 Refer Here for official docs on installing heart beat
and Refer Here for apt based installation
 Configuration:
 All the elastic stack is generally installed and
configuration files are stored in similar directories
 config location: /etc/<prod-name>
 install location: /usr/share/<prod-name>
 Edit /etc/heartbeat/heartbeat.yml to add cloud id and auth
 What has to be monitored
 apache server
 Monitor types: Refer Here
 Configuration

 Start heart beat: Refer Here


 Now open kibana & navigate to uptime

 To view th down status. stop the service and wait for the page
to reload
 Lets create an alert to send email about status of server

 Exercise: Create an alert to check if the nop-app and nop-db is


up or not.
DevOps Classroomnotes
11/May/2023
Exercise
 Create a linux vm install apache/nginx.
 Also create uptime dashboard in elastic stack
 based on icmp (ping)
 based on http

Basic Check List


 Create a linux vm in any cloud and ssh into it
 Create a ssh key from cloud and importing ssh into cloud from
your system
 Concept of Service/Daemon
 Package management – apt
 json and yaml files
 concept of sudo
 using vi or nano editor
 Fixing:
 Post on Slack

Novice Check List


 knowing the problem. log files and read the logs to figure out
errors
 installation steps and configuration steps for any application
 Concept of environmental variables and setting
 User
 System

Expert Check list


 Understanding system architectures
 System Design fundamentals
DevOps Classroomnotes
13/May/2023
Troubleshooting Beats
 all the elastic componets logs can be viewed using
journalctl journalctl -u heartbeat-elastic.service
 look into yaml for syntax issues and cloud id and auth for
configuration
 ensure metrics are enabled
Metric Beat
 This beat collects metrics about system and some predefined
applications
 Install metric beat Refer Here
 We will get system and nginx metrics to elastic cloud in next
session

DevOps Classroomnotes
14/May/2023
Metric Beats to Capture Metrics
 enable nginx metrics
 navigate to /etc/metricbeat/modules.d and rename
nginx.yml.disabled to nginx.yml
 copy the dashboards into bin sudo cp -r
/usr/share/metricbeat/kibana/ /usr/share/metricbeat/bin
 Now start the metric beat after setting in metricbeat.yml
 cloud.id
 cloud.auth
 kibana url
 We have configured all dashboards
 System Overview:

 Apache Dashboard

DevOps Classroomnotes
16/May/2023
Log Analysis
 Every logging mechanism will have levels, most widely
adopted levels are
 INFO: This is informational log
 DEBUG: This is informative log
 ERROR: This represents errors
 CRITICAL/FATAL: This represents serious system
failures
 Logs are time based information.
 In Elastic Stack we have logstash which can extract the logs,
transform and load into elastic search for
querying/visualizations
 Logstash does the transformations with the help of plugins
 input plugins: to read from different sources Refer
Here for input plugins supported by logstash
 filter plugins: to transform the log Refer Here for filter
plugins
 output plugins: to store the output to different
sources Refer Here for output plugins
 Installing logstash:
 Refer Here
 Ideal use case for us

DevOps Classroomnotes
17/May/2023
Logstash
 Lets create a linux vm and explore logstash
Logstash pipeline:

 Logstash pipeline syntax


input {}
filter {}
output {}

 In input section we can define the datasources from where we


process inputs Extract
 In Filter section we define the transformations Transform
 In output section we define the destination Load
 The list of inputs is all the installed logstash input plugins and
same with other sections
Lets create a very basic pipeline which reads input from stdin and displays out to stdout

 Stdin input plugin Refer Here


 Stdout output plugin Refer Here
 Pipeline
input {
stdin {
}
}
output {
stdout {
}
}

 Create a file with above content in /tmp/first.conf


 cd in /usr/share/logstash and execute the following
command sudo ./bin/logstash -f /tmp/first.conf

 Now lets the codec from rubydebug to json


 Edit first.conf with following content and start logstash sudo
./bin/logstash -f /tmp/first.conf

input {
stdin {
}
}
output {
stdout {
codec => json
}
}

* Lets add one more output to some file `stdout => codec =>
rubydebug
* Refer Here for file output plugin
input {
stdin {
}
}
output {
stdout {
}
file {
path => "/tmp/output%{+YYYY-MM-dd}.txt"
}
}

* Open the file for contents

Activity 2: Lets create a pipeline to read the file /tmp/test and display the contents in stdout

 input = file
 output = stdout
input {
file {
path => ["/tmp/test"]
}
}
output {
stdout {

}
}

 install apache and redirect /var/log/apache2/access.log to stdout


input {
file {
path => ["/var/log/apache2/access.log"]
}
}
output {
stdout {

}
}

 Lets try to understand filters.


 Grok filter can parse unstructured data into fields Refer Here

DevOps Classroomnotes
18/May/2023
Grok Patterns Open the DevTools in Kibana and then Grok
Debugger
 Refer Here for the grok filter in logstash
 For writing your own patterns use regex Refer Here
 Lets try to build a simple pattern as shown below
 Refer Here for grok debugger

DevOps Classroomnotes
19/May/2023
Sending logs to elastic cloud
 Overview

 Install apache and filebeat on one linux instance Refer Here


sudo apt update
sudo apt install apache2 -y

 Install logstash on other linux instance Refer Here


Configuring filebeats to send apache access logs to logstash

 Refer Here for basic configuration information


 Sending data from logstash to elastic cloud Refer Here
 Logstash pipeline
input {
beats {
port => 5044
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}

}
output {
elasticsearch {
cloud_id =>
"learningenv:dXMtY2VudHJhbDEuZ2NwLmNsb3VkLmVzLmlvOjQ0MyQxMDg1YTVjOWQyOWY0N2FjODkyNTBmNjY3N
jJkYWU3MyRlNDM5MGRmYmJmMzM0MGViODZiMGNhNTg3ODA1MmZkOQ=="
cloud_auth => "elastic:h22oWprNjqqbEGTKPSvHHpqS"
}
file {
path => /tmp/test.log
}
}

 Create a file called as apache.conf in /etc/logstash/conf.d


 Enable and start logstash service
 Now configure filebeat to send logs
from /var/log/apache2/access.log to logstash
 To generate artifical traffic we have executed the following
script
#!/bin/bash
while true; do
curl 'http://34.219.90.251'
sleep 2
done

 As of now we are getting issue with indexing (storing ) in


elastic search
[WARN ] 2023-05-19 03:52:25.065 [[main]>worker0] elasticsearch - Could not index event to
Elasticsearch. status: 400, action: ["index", {:_id=>nil, :_index=>"apachelog-2023.05.19",
:routing=>nil, :pipeline=>"apachelogs"}, {"log"=>{"offset"=>29714,
"file"=>{"path"=>"/var/log/apache2/access.log"}}, "message"=>"157.48.143.223 - -
[19/May/2023:03:52:15 +0000] \"-\" 408 0 \"-\" \"-\"", "@version"=>"1",
"cloud"=>{"machine"=>{"type"=>"t2.medium"}, "account"=>{"id"=>"678879106782"},
"provider"=>"aws", "availability_zone"=>"us-west-2c", "image"=>{"id"=>"ami-
0fcf52bcf5db7b003"}, "region"=>"us-west-2", "service"=>{"name"=>"EC2"},
"instance"=>{"id"=>"i-0b27f5e82d459e378"}}, "source"=>{"address"=>"157.48.143.223"},
"input"=>{"type"=>"filestream"}, "timestamp"=>"19/May/2023:03:52:15 +0000",
"ecs"=>{"version"=>"8.0.0"}, "http"=>{"response"=>{"status_code"=>408,
"body"=>{"bytes"=>0}}}, "@timestamp"=>2023-05-19T03:52:23.879Z,
"event"=>{"original"=>"157.48.143.223 - - [19/May/2023:03:52:15 +0000] \"-\" 408
0 \"-\" \"-\""}, "host"=>{"id"=>"b9e46fc917bf4bc080ee389c0cef33ad", "name"=>"ip-172-31-10-
238", "containerized"=>false, "hostname"=>"ip-172-31-10-238", "os"=>{"name"=>"Ubuntu",
"codename"=>"jammy", "version"=>"22.04.2 LTS (Jammy Jellyfish)", "platform"=>"ubuntu",
"kernel"=>"5.15.0-1031-aws", "type"=>"linux", "family"=>"debian"},
"architecture"=>"x86_64", "ip"=>["172.31.10.238", "fe80::8ef:a7ff:fe5a:5c85"],
"mac"=>["0A-EF-A7-5A-5C-85"]}, "tags"=>["beats_input_codec_plain_applied"],
"agent"=>{"id"=>"130803ea-47c3-46d3-aad8-8ba6449baff2", "name"=>"ip-172-31-10-238",
"version"=>"8.7.1", "ephemeral_id"=>"1b68db3e-0975-4e11-a939-83d1318ed448",
"type"=>"filebeat"}}], response: {"index"=>{"_index"=>"apachelog-2023.05.19", "_id"=>nil,
"status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"pipeline with id
[apachelogs] does not exist"}}}
[INFO ] 2023-05-19 03:52:25.066 [[main]>worker0] file - Opening file
{:path=>"/tmp/test.log"}

DevOps Classroomnotes
20/May/2023
Fixing logstash issue with elastic cloud
 We had issue with pipeline id. removed the pipeline field
 restarted all the services and executing requests from a script
 so filebeat reads the logs and sends to logstash, logstash
breaks the message into multiple fields and stores in elastic
search with index name apachelog-*
 Now create a data view from kibana


 Watch the classroom video for visualizations


 We were able to search logs by writing simple queries and
create pie charts, line charts, metric etc

Lets trace some java application


 Installing sample application
 openjdk 11
sudo apt update
sudo apt install openjdk-11-jdk -y
 download the jar file Refer HereOpen APM
 Java app tracing

 Download apm-agent jar wget


https://oss.sonatype.org/service/local/repositories/releases/content/co/elastic/
apm/elastic-apm-agent/1.38.0/elastic-apm-agent-1.38.0.jar
 We have run the app with following args
java -javaagent:elastic-apm-agent-1.38.0.jar \
-Delastic.apm.service_name=pet-clinic \
-Delastic.apm.secret_token=uu0Dl9Q09RFfMdq86p \
-Delastic.apm.server_url=https://eff6e04ad9d6425fa3492b1a56f794a3.apm.us-
central1.gcp.cloud.es.io:443 \
-Delastic.apm.environment=dev \
-Delastic.apm.application_packages=org.spring \
-jar spring-petclinic-2.4.2.jar

 Use the application and launch apm


DevOps Classroomnotes
21/May/2023
Site Reliability Engineering (SRE)
 These are processes followed by Google to run its production
systems Refer Here

 Refer Here for the article on SRE

Exercises
1. Make nop commerce observable
2. Post k8s metrics to elastic cloud

You might also like