(Whitepaper) How To Build Enterprise Kubernetes Strategy
(Whitepaper) How To Build Enterprise Kubernetes Strategy
How To Build An
Enterprise Kubernetes
Strategy
A Definitive Guide For Business Leaders
Contents
Introduction .......................................................................................................................................................................................................3
Where Will You Be Running Kubernetes Over The Next Five Years?.............................................................................7
operations teams love Kubernetes because it helps boost productivity, reduce costs and risks,
and moves organizations closer to achieving their hybrid cloud goals.
Containers have dramatically risen in popularity because they provide a consistent way to
package application components and their dependencies into a single object that can run in
any environment. By packaging code and its dependencies into containers, a development
team can use standardized units of code as consistent building blocks. The container will run
the same way in any environment and can start and terminate quickly, allowing applications
to scale to any size.
In fact, development teams are using containers to package entire applications and move
them to the cloud without the need to make any code changes. Additionally, containers
make it easier to build workflows for applications that run between on-premises and cloud
environments, enabling the smooth operation of almost any hybrid environment.
Kubernetes is an open source container orchestration platform that allows large numbers of
containers to work together in harmony and reduces operational burdens. In fact, Kubernetes,
originally developed by Google and now managed by the Cloud Native Computing
Foundation (CNCF), has become the standard for cloud container orchestration, providing a
platform for automating deployment, scaling and operations of application containers across
multiple clusters of hosts.
Kubernetes has moved from development and testing to production environments in many
enterprises. According to the CNCF Survey 2020, 83 percent of the respondents are running
Kubernetes in production, up from 78% in 2019. There has also been a 50% increase in the use
of all CNCF projects in just one year.
Clearly, Kubernetes is not a flash in the pan — it is here to stay — and its prevalence is likely to
expand dramatically as software complexity moves to more and more parts of the enterprise.
Other organizations have left it to individual departments or DevOps teams to decide for
themselves how and where to use Kubernetes. In these organizations, it isn’t uncommon to
have dozens of clusters deployed across public clouds and company data centers. Over
time, it is possible for tension to develop between individual teams wanting to run Kubernetes
in exactly the way they need it, and an IT organization that wants to maintain security and
control over how Kubernetes gets implemented.
The incentive for the development teams is flexibility: having cluster-level administrative
control allows them to configure the cluster to run exactly how they need it in terms of storage,
security policy or which infrastructure it runs on. IT teams are especially nervous about clusters
that are deployed and left unpatched and unmanaged. They would like to centralize the
operations and policy around clusters and provide access to teams who need it.
If Kubernetes and containers are going to become the primary platform for running
applications across any infrastructure, IT managers must collaborate with DevOps to develop
a plan and a strategy for Kubernetes that satisfies the needs of the development organization,
and meets IT’s own needs, as well.
As you document and understand where Kubernetes is running in your enterprise, be on the
lookout for individuals who show existing expertise in containerization. As you progress in
building your strategy, developing a team of experts who can administer your Kubernetes
clusters and deploy applications to them will be critical to driving adoption.
Building an organization-wide Kubernetes strategy means prioritizing your goals for this new
technology.
If your team sets out to use Kubernetes to reduce infrastructure costs, you’ll probably focus on
building big clusters and trying to get as much density as possible out of them.
If your team focuses instead on using Kubernetes to accelerate innovation, you’ll take a
different approach, emphasizing flexibility and delivering more tooling around Kubernetes,
such as monitoring and CI/CD integration.
To prioritize your goals, try to understand the potential of Kubernetes, and imagine how your
organization may be using it in the future.
During the next five years, for example, you may use Kubernetes to do any of the following:
— Rapidly deploy Kubernetes clusters. Today, every major cloud provider has made it
easy to deploy Kubernetes clusters within minutes. Teams are continuously building
new applications, deploying them to different clouds, and using Kubernetes to run them.
Between clusters used for development, staging, and production, and the need to deploy
Kubernetes clusters across different data centers and cloud providers, it isn’t hard to
imagine that even the most well-organized company is still running dozens of Kubernetes
clusters.
— Move onto the edge. The same modern application architectures that we think of as cloud
native are now beginning to move out of the data center. Teams building software for
factories, hospitals, and stores now want to run applications with rich data analytics and
complex architectures as close to their customers and production facilities as possible.
Running applications this way is referred to as “running on the edge.”
Between clusters running in different clouds, data centers, and the edge, it’s almost certain that
your organization will be running more than one Kubernetes cluster. Unless you know you’ll only
be running a single application in one location, it probably makes sense to build your Kubernetes
strategy with an expectation that you’ll need to be able to easily provision and manage multiple
Kubernetes clusters running in many different places.
New technologies like Kubernetes are exciting to work with and it isn’t uncommon for many teams
to try to take ownership of building a containerization and Kubernetes strategy for their company. It
isn’t uncommon for individual DevOps teams, shared services groups, central IT, cloud platform or
platform-as-a-service (PaaS) groups to feel that they should be responsible for building a strategy
around Kubernetes.
Two teams that often lead the Kubernetes strategy are the shared services team (responsible for
supporting developers and DevOps) and the central IT team (responsible for computing platforms).
Putting either team in charge of Kubernetes strategy provides the following benefits:
— Shared Services: The shared services team brings key insights on how an organization is
modernizing its approach to application development, as well as the requirements teams have
identified they need in a Kubernetes platform. They often understand other key systems that
have been built for DevOps, such as continuous integration/continuous delivery (CI/CD) tools,
development environments, data services, and application monitoring tools. Whether these
— Central IT: The central IT team, focused on cloud computing and other computing
platforms, is also a logical team to lead a Kubernetes strategy. They have a strong
understanding of platform operations, infrastructure, security, multi-tenancy, and existing
IT investments, and they usually have significant experience running critical projects. A
project led by the IT platforms team will benefit from their understanding of the broad
requirements of many different teams across a large, complex organization. Note that
projects coming out of central IT often suffer from too little engagement with end users and
too much influence from existing technology vendors. These teams often have very little
experience with the latest application architectures and benefit enormously from working
closely with teams leading innovation around application development.
With Kubernetes, there is enough flexibility in the platform and the ecosystem to satisfy any
team. Exposing that flexibility is critical to delivering value. Any strategy that abstracts away
Kubernetes will probably face resistance from your most innovative teams. At the same time,
the flexibility of Kubernetes and its ecosystem can be a hindrance to some teams looking for a
platform to just run standard apps.
One of the most exciting developments in the Kubernetes space in the past few years has
been the emergence of lightweight projects that run on Kubernetes but provide frameworks
that simplify application management. These approaches allow containers to “scale to zero”
and provide simple declarative languages to build, connect, scale, and monitor services. They
As you build your Kubernetes strategy, consider blending the best of a decentralized approach
with enough controls and management to ensure compliance and remove repetitive
tasks. Try to centralize and automate everyday tasks such as Kubernetes cluster lifecycle
management, role-based access control (RBAC) policies, infrastructure management, and
other day-2 operations. At the same time, give your teams options for where they can get
access to Kubernetes clusters and whether they can use a shared cluster or a dedicated
cluster. Focus primarily on maintaining visibility into all the provisioned clusters, not necessarily
forcing teams to use a set of preapproved clusters in a specified way.
— The first suggests that you should use the cloud providers only for core infrastructure
provisioning, and build a consistent platform based on Kubernetes on top of this
infrastructure. With this approach, teams would develop a consistent implementation of
Kubernetes and any of its dependent services, and then build a common platform on top
of the cloud infrastructure. This approach aims to minimize cloud lock-in and achieve
broad application portability. These teams will try not to use any of the proprietary services
that different cloud providers offer, instead opting for open source or multi-cloud solutions.
The large platform software companies often recommend this approach, suggesting that
using their PaaS platforms across different clouds can alleviate cloud lock-in.
— The second approach suggests that teams standardize policy and management around
Kubernetes, but assume that wherever they run Kubernetes, their developers will probably
want to use other services that might be unique to that computing environment. This
approach suggests that if you are running Kubernetes in AWS, you shouldn’t hesitate to
use other services that might be unique to AWS. These teams worry less about lock-in and
more about giving application teams the flexibility to use the native capabilities of different
platforms. With this approach, the focus needs to be on providing common management
and tooling around different implementations of Kubernetes.
Storage is another area where integrating with hyper-converged infrastructure can add a
lot of value. Many of these platforms have Kubernetes drivers that simplify volume creation
and can provide additional value around backup and recovery of stateful workloads running
on your Kubernetes platform. An excellent persistent block storage solution to investigate is
Longhorn. First developed at Rancher Labs (now part of SUSE), Longhorn is 100% open source
software whose development is now governed by the CNCF.
When your familiarity with Kubernetes expands and your team can manage multiple
production clusters running stateless, cloud native applications, that is a good time to start
looking at running legacy applications in containers. An entire paper could be written on best-
practices for migrating legacy applications to Kubernetes, but the bottom line is that it almost
always makes sense to run these applications in dedicated clusters with different approaches
to infrastructure management. These applications are architected with an expectation of
stability and infrequent failure scenarios. You can mimic that in Kubernetes and still get a
lot of the other benefits Kubernetes offers, such as consistent security, support of the latest
operating systems, advanced automation, and great monitoring and visibility.
However, try not to orient too much of your business case for containerization toward cost
savings. Most organizations will take years to migrate a significant portion of their existing
application footprint to containers and Kubernetes.
Most of this time will be spent figuring out the right strategy for each application, specifically
whether to replace, rearchitect or just migrate it. Kubernetes certainly can help you get great
infrastructure utilization, but it will take time, and the strategy you are developing now is more
likely to impact your organization by enabling rapid innovation than by cutting infrastructure
spend.
Regardless of your team’s skill level, you’ll almost certainly have team members who need
to be trained on either using or administering Kubernetes. Luckily, there is no shortage of free
Kubernetes training providers and online courses including the SUSE+Rancher Community
Academy – visit https://community.suse.com/all-courses
As you build your core team of early Kubernetes admins and users, consider setting a goal to
train and certify as many members of your team as possible. The tests are rigorous and help
you ensure that you build strong internal knowledge about using containers and Kubernetes.
After you have some initial expertise, you may want to wait to do further training until you’re
out of the design phase of your strategy and bringing on more teams to work with the specific
implementations of Kubernetes your organization is adopting. At this stage, SUSE provides
an array of different consulting services which can help you mitigate the risks associated
with large, production grade deployments. For more information visit https://www.suse.com/
services/
Some analyst firms describe this class of software as Enterprise Container Management (ECM)
software and is a term used to describe tools like RedHat OpenShift, VMware Tanzu, Google
Anthos and Rancher Prime. While we don’t want to spend too much time in this document
comparing these different offerings, we do want to highlight some capabilities you should be
considering when determining how they can help you implement your Kubernetes strategy.
Some will also include integrated monitoring, etcd backup and recovery and infrastructure
provisioning and auto-scaling. If you will be using a Kubernetes distribution provided by your
ECM vendor, it is important that the distribution you use is certified by the CNCF. This will ensure
that it is consistent with upstream Kubernetes and quickly supports the latest features being
developed in the community.
Visibility is great, but you will also want to implement policy and controls, automate
operations, provide application catalogs, and possibly offer other shared services. If multi-
cluster management is key to your strategy, be sure that you understand what it means and
how different ECM platforms implement it.
Once you have ensured your platform supports the necessary access control capabilities,
consider what administrative capabilities you can delegate to team leads, cluster owners,
and project owners. Does the platform allow you to dedicate resources to specific teams?
Can you easily define resource quotas and manage utilization of shared platforms? Can
teams collaborate on projects and share application catalogs? However, you decide to deliver
Kubernetes to these different teams, be sure you are providing direct access to the Kubernetes
API and kubectl, as this will ensure they can access all of the features of Kubernetes.
For example, the Kubernetes Pod Security Policy is a cluster-level resource that controls
security sensitive aspects of the pod specification. The Pod Security Policy objects
define a set of conditions that a pod must run with to be accepted into the system, as
well as defaults for the related fields. They allow an administrator to control functions
such as running of privileged containers, usage of host namespaces and usage of
host networking and ports, to name a few. Policy management can also address
container image scanning, cluster configuration and even application deployment. For
instance, if your organization decides to implement a container security product, policy
management should allow you to ensure that these applications are automatically
installed on any new or imported Kubernetes cluster.
One of the biggest risks of an ECM is that it puts too much emphasis on integrated
solutions and ease of use and ends up limiting flexibility. Kubernetes is well architected
for plug-and-play integration with most of its ecosystem, so be sure not to lose that
flexibility. For example, if a platform offers an integrated CI/CD, make sure that your
teams that already have CI/CD tooling can easily connect their existing process to it
as well.
If you don’t want to operate one of these platforms yourself, some vendors offer cloud-based
or managed versions of these ECMs. If you decide to move to a hosted ECM, consider how
much flexibility you’ll have in the future to move off it as your needs change. Is it a shared
implementation of the ECM or is it dedicated to your organization? You’re going to build lots
of policies, templates, best practices, and integrations with an ECM, so make sure that there is
some way for you to extract that logic and move it to a different platform if your requirements
change in the future.
As you set off on this journey, pay special attention to learning from other organizations that
are adopting Kubernetes. Every year, the presentations from KubeCon are recorded and
posted to YouTube. You can find a wealth of real-world advice from teams who have gone
through rolling out Kubernetes at either a project or company-wide level.
— Enhances the ability to deploy, manage and scale services JASMIN offers to the user
community
Products
Rancher Prime
What is STFC?
On behalf of the UK scientific community, STFC is tasked with running national laboratories and
conducting research into major science projects with large-scale infrastructure and resource
requirements. STFC has a broad remit. It provides bespoke technology and resources for
hundreds of research projects, covering a wide range of different science areas. Its UK-based
teams are involved in high-profile projects, and JASMIN is run in partnership between the
CEDA, part of STFC’s Rutherford Appleton Library (RAL) Space department, and STFC’s Scientific
Computing Department. No matter the variety and scale of each project, JASMIN has a clear
mission — to provide scalable compute environments that help researchers find meaningful
answers to the big scientific questions in the environmental sciences domain.
Amongst the host of research projects STFC’s compute resources support, a growing number
of teams are focused on pushing boundaries in climate science. The JASMIN Notebooks
Service has become a useful tool to support data
analytics in environmental sciences research. Later, “The team at STFC were
we’ll explore an example from a Ph.D. student working looking for a vendor-backed
with STFC’s JASMIN platform to better understand rising solution to help manage its
inner-city temperatures. Kubernetes estate. Working
with Rancher Prime, the
Over the years, Jupyter Notebooks have become a Kubernetes architecture was
familiar way for academics to consolidate and analyze easy to deploy, manage and
volume data. A Jupyter Notebook is an interactive scale.”
document containing live code and visualizations, Sheng Liang
viewed and modified in a web browser. The JASMIN Former President of Engineering
Notebook Service, based on the open source and Innovation, SUSE
JupyterHub, provides the ability to run multiple Jupyter
Notebook sessions via one easily accessible platform
served over the web.
Part supercomputer, part cloud, STFC’s JASMIN platform was the backbone of a recent COP26
hackathon, which saw over 150 attendees exploring topics ranging from climate change to
oceanography and biogeochemistry. JASMIN gave attendees — even those with no prior
coding experience — easy access to Jupyter Notebooks via web browser where they could
interact with terabytes of data.
Ease of use is particularly critical when managing the complexities of climate data at scale.
STFC’s JASMIN platform allows the processing and analysis of massive data sets, from
multiple origins and in myriads of formats. By co-locating these data alongside services like
Jupyter, JASMIN gives user more control by removing issues around data movement and
data wrangling to allow them to get insights more quickly. While STFC’s JASMIN platform is the
secret to potentially world-changing research projects, Kubernetes, underpinned by Rancher
Prime, provides the underlying infrastructure for the Notebook Service.
To fulfil the need to reduce latency and enable real-time analysis, STFC opted for a bare-
metal cluster deployment from Rancher Prime. Once configured with STFC’s storage
capabilities, this allowed researchers access to petabytes of data in real time — impossible in
a standard desktop environment.
In recognition of the growing demand for hyperscale compute resources, STFC wanted to build
greater scale and agility into its Kubernetes estate. Rancher Prime allows technology teams
to flex compute resources faster, making the service resilient for the long term, no matter how
complex or varied projects may be.
Secondly, another key factor for STFC was interoperability; not only does the organization
combine its bare-metal setup with data hosted in the cloud, users are also turning to the
JASMIN Notebook Service for a widening range of use cases. Kubernetes and Rancher Prime
offers those users the freedom to choose what best suits diverse needs, and the flexibility to
build, scale and transform on demand.
Sheng Liang, former president of engineering and innovation at SUSE says: “The team at STFC
were looking for a vendor-backed solution to help manage its Kubernetes estate. Working with
Rancher Prime, the Kubernetes architecture was easy to deploy, manage and scale.”
Case Study: Sarah Berk Ph.D.: Exploring the Heat Island Effect in Inner Cities
Sarah Berk is a Ph.D. student at the University of East Anglia (UEA) who uses STFC’s JASMIN
Notebook Service to analyze terabytes of data measuring heat from cities all over the world.
She’s exploring the urban heat island effect; a phenomenon in which metropolitan areas are
significantly warmer than surrounding rural areas due to human activity and properties of the
urban environment.
“It’s a really important area for two key reasons,” says Berk. “The first is migration. Now, over
half the world’s population live in cities, and that will grow to 68% by 2050. The second factor
is this backdrop of a changing climate, particularly increasing global temperatures and an
increasing frequency of heatwaves.”
Berk, who learnt a new programming language for this project, says that JASMIN enabled her
to hit the ground running without having to learn a lot of new skills. It allows her to visualize and
analyze huge amounts of data at a speed that otherwise wouldn’t be possible. She’s currently
working with two datasets — land surface temperature and land use data.
Berk explains: “Because I am using 15 years’ worth of data spanning the entire globe in
300-meter resolutions, the sheer volume of data processing is immense. It’s not possible to
use my laptop to analyze this, which is why I use STFC’s JASMIN Notebook Service.
What’s next?
Since JASMIN was first launched in early 2012, it has grown significantly in scale and complexity
but also in the variety of projects it serves. This growth is likely to continue as the world
increasingly turns to the research community to find innovative ways to combat the climate
crisis.
Products
Rancher Prime
What is HyperGiant?
Hypergiant Industries focuses on solving humanity’s most challenging problems by delivering
best in class artificial intelligence solutions and products. The company creates emerging
AI-driven technologies for Fortune 500 and government clients working in a host of sectors,
including space science and exploration, satellite communications, aviation, defense, health
care, transportation and more.
Previously a ‘Cyber Transport Journeyman’ in the U.S. Air National Guard, Bren Briggs is used
to being deployed to the most hostile environments to design and run command and control
communications. Now responsible for DevOps and Cybersecurity at Hypergiant, Briggs draws
on his military background, bringing expertise in designing and building repeatable, secure
deployments of Kubernetes and other infrastructure to support Hypergiant’s AI and machine
learning (ML) applications and customers.
Briggs started working with Kubernetes several years ago to solve the problem of managing
multiple Docker containers. At that time, containers were built and shipped like virtual
machines (VMs) — updated and managed in Puppet. With a growing need for consistency,
Kubernetes became an important way to reliably network the estate of containers and
manage them centrally.
In 2019, Briggs brought this experience to Hypergiant, where he noticed repetition in existing
development methods. He recognized the need for a platform that would allow developers
to deploy repeatable patterns with less manual effort. They needed a cloud that could run
anywhere, and so Briggs suggested Kubernetes to automate the creation of repeatable
workloads. In early 2020, the team started rolling out Kubernetes on several internal systems.
As they ran on AWS, EKS was initially deployed to orchestrate the first cluster, but soon, K3s
entered the team’s consciousness.
The satellite industry faces several problems. Firstly, space-rated hardware is costly. Software
development and delivery processes are slow. And it’s cold — really cold. On-orbit satellite
software updates are often not possible or incredibly time-consuming and expensive. As a
result, AI/ML capabilities are far behind those currently available on earth.
Satellite connectivity and bandwidth are poor, which makes downloading large images and
other data difficult. The SatelliteONE mission has been designed to solve this problem. The
project will demonstrate DevSecOps in space by leveraging PlatformONE’s CI/CD pipeline
alongside Kubernetes provisioning and deployment. Importantly, it will evaluate the use of
lower-cost hardware on satellite payloads, show how the rapid delivery of software updates in
space can be done and demonstrate the use of AI/ML software in orbit.
The U.S. Department of Defense, along with governments worldwide, are looking for ways to
build longevity and sustainability into satellite fleets. They want to modernize their entire fleets
to be managed, maintained, and, crucially, reconfigured for various use cases for the long
term. This is a massive shift, but investment in modernization will create agile technology
stacks that can be more easily updated and replaced in the future.
Because K3s is packaged as a single <40MB binary, it reduces the dependencies and steps
needed to install, run and auto-update a production Kubernetes cluster. Supporting both
ARM64 and ARMv7, K3s works just as well on a Raspberry Pi as it does with an AWS a1.4xlarge
32GiB server. K3s is enabling Hypergiant to deploy the modern, lightweight software systems
that will drive the next evolution and commoditization of the satellite industry.
SatelliteONE’s initial mission is an important one—to take a picture of Baby Yoda. While this
doesn’t sound particularly ground-breaking, this exercise’s primary purpose is to demonstrate
the ability to capture images and perform AI and ML workloads at the edge.
This is an important milestone given the bandwidth constraints experienced in space. When
dealing with a link, measured in kilobits per second, only available for a few minutes at a time
during a flyover window, transfer efficiency is the ultimate goal. In this use case, the team
wants to transfer the minimum amount of information.
The architecture consists of two Raspberry Pi server nodes, independent of each other, with
one being a warm backup. There are two Raspberry Pi 4 worker nodes, each loaded with a Pi
camera, and another Raspberry Pi 4 with an accelerometer and light sensor. Finally, there is a
Jetson nano module (developed by NVIDIA), a $99 IOT AI processor mounted on the satellite
and used to capture and process the image of Baby Yoda.
The two master nodes have an additional ethernet adapter attached to another switch
and have connectivity to the Cygnus device and external network, allowing communication
with the ground (Cygnus is the vehicle the team is hitching a ride on courtesy of Northrop
Grumman). There are two independent K3s server nodes in place. This is critical should
one server device fail in orbit — the team can quickly communicate with the server node,
automatically failover, or run scripts to rebuild the cluster in a matter of minutes. Since the
clusters are logically separated, all the K3s images and the application workloads are stored
locally on each node—less overhead and one less point of failure.
Importantly, this is about sending analysis rather than the raw image itself. Ultimately, a
human can still make decisions as to which photos to downlink. However, the focus here is to
process decisions at the edge, sending analysis and results instead of the raw dataset.
The team is also conducting a temperature experiment to observe how the Jetson behaves
in a vacuum under a heavy computational load. Why, though, would the team put a
whole satellite in space to take a few pictures of Baby Yoda and conduct temperature
measurements?
By deploying novel compute technologies, like K3s on orbiting devices, their longevity and
ongoing usability are assured. Working with K3s, Hypergiant, SUSE’s RGS and DOD PlatformONE
are creating a commercial future for data processing in space.
In this latest Buyer’s Guide, we compare Rancher Prime with the three
most competitive Kubernetes management platforms: Red Hat
OpenShift, VMware Tanzu, and Google Anthos.