The Wayback Machine - https://web.archive.org/web/20120509042108/http://cloudcomputing.sys-con.com/node/804549

Welcome!

Cloud Expo Authors: Nicole Gorman, Chris Evans, Max Katz, Aaron Hollobaugh, Jaime Ryan

Related Topics: Cloud Expo, Virtualization

Cloud Expo: Article

Are Humans Really Necessary for Maintaining SLAs in the Cloud?

The role of remote monitoring in Cloud Computing

Eric Novikoff's Blog

Are humans really necessary for maintaining SLAs? In today's cloud computing deployments, especially with systems like Amazon's EC2, the users' application is responsible for both measuring and taking action on application performance issues. This complicates deployment and coding, as well as tying your application to a particular cloud provider. However, I believe that the next generation of cloud deployment frameworks will be able to do this automatically, by integrating general-purpose monitoring applications with policy-based cloud management engines. 

When I was watching the recent the election returns on CNN, I wasn't sure what was more amazing: Obama's historic victory, or CNN's technology. CNN was able to display up-to-the minute results of each state's elections simply at the touch of their news anchor onto the screen of an election-reporting system. The anchor could touch a state, then touch a metric, such as various demographics, and instantly cut the election results up by age, exit poll answers, or racial composition. It blew me away.

But it also reminded me of trying to manage a complex set of application deployments into the Cloud - a virtual private data center.

When you take into account that a reasonably complex multi-tier application with significant load can consume tens of virtual servers, all of which need to function successfully in a coordinated ballet, you realize you need the kind of information and analytic capabilities that CNN has available, just to tune it and keep it working. Because of this, we've invested in an amazing remote monitoring package, NimBus, which we provide as a service to our hosted customers as well as other customers. NimBus allows measurement of pretty much any parameter inside a virtual server or the applications running inside of it, from simple (but important) aspects such as CPU or memory utilization, to more complex metrics like database queries per second, slow query count, or pages served per unit time from a web server. In addition, NimBus can perform user-experience validation by running synthetic (fake) transactions against an application and reporting what the user experiences in terms of response time and page correctness.

All of this is summarized on a customizable dashboard, much like CNN's election status screen:

So, armed with this information - and hopefully not overwhelmed with too much information - we (or our customers) can tune and adjust their applications for appropriate cost/performance tradeoffs or diagnose performance or efficiency issues. It has produced great results for the customers who implemented remote monitoring, improving their application response time and uptime, as well as reducing costs.

However, the road hasn't been easy. The Cloud, by its very nature, is constantly in flux, mutable. This presents a contradiction in goals to an organization: to optimize something, it needs to be stable so you can measure it and make changes; yet to get the best economies out of the cloud, you need your infrastructure to be elastic, scaling on demand. Because servers can come and go, and IP addresses can change, setting up a monitoring system and keeping it running isn't easy. How can you monitor Apache server #2 if it is only instantiated when the web site's load is too high for one Apache? Luckily, most of our clients' deployments don't change radically over the short term, so the monitoring package can be set up and continue to run for quite a while before it needs reconfiguration.  However, for very elastic loads, you need to either observe the results of your cloud deployment instead of its internals (such as by snooping on its communications with customers) or have your automatic instance deployments also request on-demand monitoring.

Once you add monitoring to your cloud deployment, you can start to take advantage of the powerful capabilities of Total Quality Management, a management philosophy popularized by W. Edwards Deming. A core principle of TQM is CPI or continuous process improvement, summarized with the following chart:

TQM says you want to set goals for your process (in this case your software deployment), then you want to run the process (deploy the software), measure the results against the goals, and adjust the settings based on the goals to control the process to produce the desired results (typically a satisfy SLA in the software deployment world.) However, the real power comes when you report on the results of this process and then use it to take another look at your goals. The result is continuous improvements in "quality" - in other words, in your ability to deliver the results of your process successfully.

This is how we use monitoring to get the most out of Cloud deployments.

But then I had this insight: why do us - humans - have to be in the loop at all with respect to acting on the monitoring? Naturally, if the monitoring detects some sort of application or hardware failure, humans need to get involved. But are humans really necessary for maintaining SLAs? In today's cloud deployments, especially with systems like Amazon's EC2, the users' application is responsible for both measuring and taking action on application performance issues. This complicates deployment and coding, as well as tying your application to a particular cloud provider. However, I believe that the next generation of cloud deployment frameworks will be able to do this automatically, by integrating general-purpose monitoring applications with policy-based cloud management engines. At ENKI, using our monitoring services, we are already able to automate some of this policy-based management without the need for the application to be aware of the details of this process. However, a quick caution is in order: if the application isn't designed from the ground up to be elastic (for example, to have new web servers added dynamically) then all the automation in the world won't allow it to participate in automated SLA assurance.

More Stories By Eric Novikoff

Eric Novikoff is COO of ENKI, A Cloud Services Vendor. He has over 20 years of experience in the electronics and software industries, over a range of positions from integrated circuit designer to software/hardware project manager, to Director of Development at an Internet Software As A Service startup, Netsuite.com. His technical, project, and financial management skills have been honed in multiple positions at Hewlett-Packard and Agilent Technologies on a variety of product lines, including managing the development and roll-out of a worldwide CRM and sales automation application for Agilent's $350 million Automatic Test Equipment business. Novikoff also has a strong interest in SME (Small/Medium Size Enterprise) management, process development, and operations as a consequence of working at a web based ERP service startup serving SMEs, and through his small-business ERP consulting work.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Cloud Expo Breaking News
It’s easy to lose your head in the clouds. While virtualization has provided a way to satiate the need for on-demand solutions, it is easy to lose sight of the appropriate architecture when being allured to the sky. In his session at the 10th International Cloud Expo, Phil Jackson, Development Community Advocate for SoftLayer, will go back to the basics and take a practical approach to solution building: how to structure your application to take advantage of hybrid environments and provide the...
The move to cloud-based applications has undeniably delivered tremendous benefits. However, the associated distribution creates various challenges from the quality perspective: End-to-end tests need to pass through multiple dependent systems, which are commonly unavailable, evolving, or difficult-to-access for testing. Accessing such systems often involves transaction and bandwidth fees. Teams need to test and tune the system under test against a realistic and broad range of performance and ...
“The speed of businesses is accelerating, requiring continuous tuning and optimization of business processes and operations,” noted Bruce Fingles, CIO and VP of Product Quality at Xsigo Systems, in this exclusive Q&A; with Cloud Expo Conference Chair Jeremy Geelan. “IT organizations must have an infrastructure that enables them to partner with business leaders and respond quickly to changes that can help drive the business forward,” Fingles concluded. Cloud Computing Journal: Agree or disagree? ...
In spite of the great strides the cloud industry has made in addressing security and integration concerns, large enterprises (e.g., banks, insurance companies, health care firms) continue to be reluctant to adopt the cloud for mission-critical applications. Further, resistance to cloud adoption is now at least as much an issue of misaligned incentives and fear of the unknown as it is about legitimate technology concerns. Breaking the impasse on mission-critical apps often can't be done directl...
Want to save your business money? Of course you do. What if we could show you a way to use the cloud in and around your office, get your workforce mobile, make communication faster and easier, and reduce OPEX? We can. In his session at the 10th International Cloud Expo, Jason Silverglate, CEO of Fortress ITX and its subsidiaries, will present a “how to” analysis of what cloud technology can do for the modern office, of any size. He will show how incorporating hosted PBX, hosted email, and even...
With Cloud Expo 2012 New York (10th Cloud Expo) now five weeks away, what better time to introduce you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference... We have technical and strategy sessions for you every day from June 11 through June 14 dealing with every nook and cranny of Cloud Computing and Big Data, but what of those who are presenting? Who are they, where do they work, what else have they w...
The elastic resources offered by cloud computing have created an exciting opportunity for applications to handle very large workloads. However, writing applications that span an elastic pool of virtual servers creates huge challenges for developers. How can these virtual servers easily and efficiently share application data while avoiding scaliability bottlenecks? The answer lies in using in-memory data grids (IMDGs) to provide a powerful, easy-to-use, and highly scalable storage layer. IMDGs ...
In this CEO Power Panel at the 10th International Cloud Expo, moderated by Cloud Expo Conference Chair Jeremy Geelan, leading executives in the Cloud Computing and Big Data space will be discussing such topics as: Is it just wishful thinking to depict the Cloud as more than just a technology solution? If not, then what concrete examples best demonstrate cloud computing as an engine of business value? Big Data has existed since the early days of computing; why, then, do you think there is such...
SYS-CON Events announced today that ComputeNext Inc. will exhibit at SYS-CON's 10th International Cloud Expo, which will take place on June 11–14, 2012, at the Javits Center in New York City, New York. What’s the scope of your “single pane of glass”? If you’re a cloud architect wouldn’t you be better suited with a telescope than a magnifying glass? The ComputeNext marketplace and workload manager sprawls across public clouds, eliminating vendor and platform lock-in. A single point of payment a...
Information Security and Risk has become a top concern of IT organizations and consumers alike. Concern about inadequate Info Security remains the #1 obstacle to greater adoption of Cloud Computing, according to Intel’s research. The rapid growth of Mobile and IP-connected Embedded devices, Cloud Computing, Social Networks, and “Consumerization of IT” is being met with, and in some cases contributing to, an escalating number and complexity of Cyber-threats. Tenants of the cloud need the ability ...