Site Reliability Engineer Nanodegree Program Syllabus
Site Reliability Engineer Nanodegree Program Syllabus
Site Reliability
Engineer
Prerequisites
A well-prepared learner is already able to:
• Write basic functions in an object-oriented language (Python or Java), such as for loops, conditionals, Control
Flow, Python Methods, Java Methods, etc.
• Write basic shell scripts in Bash or Powershell, which could include for loops, conditionals, scripting, etc.
• Understand Linux command-line (bash/shell) and UNIX Shell.
• Create simple SQL queries using SELECT, JOINS, GROUP BY functions.
• Exercise networking skills including knowledge of virtual networks, DNS, subnets, and basic network
troubleshooting techniques.
• Perform DevOps tasks, such as setting up monitoring, doing feature rollout, troubleshooting production
issues, ideally for large systems.
• Work with Kubernetes and basic kubectl, such as kubectl apply, kubectl create, kubectl config.
Educational objectives
A graduate of this program will be able to:
• Use proactive and reactive SRE strategies (monitoring, postmortem, team building, etc.) to identify reliability
risks through evaluating systems and processes.
• Develop customer-centric SLOs (such as percentile targets for availability, latency, and correctness) and set up
corresponding monitoring and risk mitigation measures to ensure customer happiness.
• Create and deploy automated self-healing architectures and other technologies to make the environment
more maintainable.
• Design and implement organizational processes and culture that enhance product reliability, including
outage/postmortem review, quarterly state of production presentation, and production readiness review.
Software/hardware and
Flexible learning: version requirements:
The program There are no software
is flexible and and version requirements
self-paced with to complete this
suggested project Nanodegree program. All
deadlines coursework and projects
can be completed in the
Udacity online classroom.
Udacity’s basic tech
requirements can be
found at udacity.com/
tech/requirements.
*The length of this program is an estimation of total hours the average student may take to complete all
required coursework, including lecture and project time. If you spend about 5-10 hours per week working
through the program, you should finish within the time provided. Actual hours may vary.
In this course we will focus on what observability requires in terms of people and tools. To begin with, we
will introduce SRE, its roles and responsibilities, and how those differ from other teams (DevOps, SysAdmin,
Development). Once we establish that, we will see how SRE helps an enterprise improve and discuss the costs
associated with SRE. We will come to know the types of members of the SRE team, then end with the tool set that an
SRE team may use to be successful.
In this project, students will apply the skills they have acquired in
the Establish a Foundation in Observability course to configure a
monitoring software stack to collect and display a variety of metrics
for commonly used cloud resources including VM Scale Sets,
Course 1 Project Kubernetes service, and VMs. Additionally, students will establish
Observing Cloud and configure rules for alerting and set parameters to be notified
Resources prior to the occurrence of failures within the aforementioned cloud
resources. Students will also have the opportunity to test and
observe their own implementation of the monitoring software stack
to apply and showcase SRE methodologies and practices which can
be transferred to real-world scenarios.
LEARNING OUTCOMES
LEARNING OUTCOMES
You’ll begin by learning some self-healing system design fundamentals such as single points of failure and
three-tier architecture. Then we will show you some self-healing deployment strategies, implementation
steps, and use cases. Finally, we’ll cover some cloud automation that you can use to increase the resiliency
of systems, such as auto-scaling automation.
Students will play the role of an engineer who has just started at
a growing consulting firm called Casa de mi Padre. Due to some
unfavorable company policies, the team they were supposed to
have joined has left the company. Due to their rush to leave, the
applications they were working on were left in an undocumented,
unknown state. The company is raring to get back on pace, and
Course 3 Project
students are tasked with deploying them to the cloud. Some of
Deployment Roulette the microservices have scaling or availability issues, and some
don’t have a deployment strategy in place. It’s up to the students
to identify failing applications and implement fixes to resolve the
problems. Students will also create an architecture diagram that
communicates the status of the cloud environment to improve the
onboarding of future developers.
LEARNING OUTCOMES
Implement Scaling
and Failover • Describe cloud automation for scaling and failover
Automation • Automate microservices scaling
LESSON THREE
Strategies for • Automate virtual machines scaling
High-Availability • Automate microservice cluster scaling
Applications
LEARNING OUTCOMES
KNOWLEDGE
Find answers to your questions with Knowledge, our
proprietary wiki. Search questions asked by other students
and discover in real time how to solve the challenges that
you encounter.
WORKSPACES
See your code in action. Check the output and quality of
your code by running them on workspaces that are a part
of our classroom.
QUIZZES
Check your understanding of concepts learned in the
program by answering simple and auto-graded quizzes.
Easily go back to the lessons to brush up on concepts any
time you get an answer wrong.
PROGRESS TRACKER
Stay on track to complete your Nanodegree program with
useful milestone reminders.
Nathan is a Certified Six Sigma Black Travis Scotto has worked in technology
Belt and has 10+ years of experience in for 10 years. He has worked in
IT in multiple industries. He is also the various infrastructure roles including
instructor for two other Udacity courses: virtualization, databases, and monitoring.
Ensuring Quality Releases and Azure As an SRE, he employs automation and
Performance. monitoring daily. He has also been an
adjunct IT instructor.
C AREER SUPPORT
• Resume support
• Github portfolio review
• LinkedIn profile optimization
Each project will be reviewed by the Udacity reviewer network. Feedback will
be provided and if you do not pass the project, you will be asked to resubmit
the project until it passes.
W H AT S O F T WA R E A N D V E R S I O N S W I L L I N E E D F O R T H I S P R O G R A M ?