The Wayback Machine - https://web.archive.org/web/20120507214615/http://cloudcomputing.sys-con.com/node/2093869

Welcome!

Cloud Expo Authors: Pat Romanski, Liz McMillan, Maureen O'Gara, Elizabeth White, Margaret Dawson

Related Topics: Cloud Expo, SOA & WOA, Virtualization

Cloud Expo: Article

Different Approaches to Databases in PaaS: A Round-Up

Most PaaS offerings provide some kind of database services

In an application deployed directly on IaaS, you know and control everything about the database; in a SaaS application you know little and control nothing.

But how does it work in PaaS?

Since a PaaS is essentially a container that runs application code, and virtually every application requires a persistent data store, most PaaS offerings provide some kind of database services. Not surprisingly, Resource PaaS offerings most closely resemble SaaS in that they hide more deployment details, while Server PaaS offerings are more flexible but potentially more complex. (For more on Resource PaaS vs. Server PaaS, see Keys to the PaaSing Game: Multi-Tenancy.) What is surprising is that some Resource PaaS offerings use a proprietary and non-standard database. Let's take a closer look at how several of the leading Platform-as-a-Service offerings handle databases and file system access for applications running on them.

Google's AppEngine
The Google AppEngine PaaS provides permanent storage only through a system called "BigTable," a highly scalable non-relational data store, accessed through a limited SQL-like language called GQL. Because AppEngine only allows applications to use its provided API libraries, using a third-party database or database service is possible but awkward; the code must funnel requests through a URL-fetching API that is subject to various limitations. Although there is no direct access to a filesystem, file-like storage can be achieved through a specialized API built on BigTable. Recognizing that the lack of a relational database has been slowing the adoption of AppEngine, Google released a preview version of Google Cloud SQL in October 2011. Cloud SQL is a database-as-a-service (DBaaS) based on MySQL, offering nearly all MySQL commands, except those involving files. It automatically handles its own variant of geographic replication, but does not allow MySQL replication. During the preview, Cloud SQL databases are limited to 10 GB in size, and can only be accessed through AppEngine applications. The former constraint seems likely to be relaxed when the preview period ends; it is not clear if - or when - the latter will be.

Force.com
Salesforce.com's Force.com also provides a built-in non-relational object database, which was recently split out into a DBaaS called database.com. Database.com is accessed through the proprietary Apex language from within Force.com, or via API calls from outside, using a SQL-like language called SOQL. Configuration is limited to tables and fields only. Force.com applications can only access external databases or file stores through HTTP callouts, which are subject to certain limitations. It seems unlikely that Salesforce.com will offer a relational data store with Force.com, since they have long claimed that their approach is superior. Nevertheless, this limits the customer's ability to port applications in or out, find developers experienced in the approach, or build capabilities that require true relational power.

Microsoft Azure
Microsoft Azure offers access to the SQL Azure DBaaS, a high-performance, scalable, and fully managed service. Azure applications cannot access external databases, and external applications cannot access SQL Azure; however, SQL Azure databases can be synchronized with on-premises SQL Server databases. Developers can only access database-level configuration, and while there is no filesystem access, file-like storage is available in Azure through a blob-based emulation system. Like AppEngine and Force.com, Azure is something of a walled garden, but in Microsoft's garden at least you have all the basic food groups.

Heroku
Heroku
treats the database as distinct from the application container, allowing the application to use any database or database service. It also provides a built-in DBaaS, freeing developers from the need to provision and manage the database deployment. The Heroku DBaaS is based on PostgreSQL, can be operated in a shared or dedicated mode, and can be accessed from external applications as well. Developers have unlimited access to the database (via a database client command line), but no access to database deployment configuration settings or versions, in keeping with the Resource PaaS approach. For similar reasons, Heroku does not allow applications to write to the filesystem; a third-party service or a database-mapped approach is required.

CloudFoundry.com
CloudFoundry.com
has a DBaaS that supports four different databases natively: MySQL and PostgreSQL for relational, and MongoDB and Redis for NoSQL. Developers have access only to the database and not to its configuration settings. External databases can be accessed via APIs using an HTTP proxy; in the future, a Service Broker is promised that will enable more direct access to external databases and other services. With the recently added Caldecott tool, customers can also tunnel into their CloudFoundry.com database service from the outside. Although it is possible to write to the filesystem in CloudFoundry.com, the files are actually ephemeral and should not be relied upon as a data store. Note, by the way, that CloudFoundry.com is a service that uses the Cloud Foundry open source software as one of its foundational technologies, but the two are distinct.

Server PaaS Options
Server PaaS offerings, like AWS Elastic Beanstalk, RightScale, Engine Yard, and Standing Cloud, generally have fewer constraints on the database to be used by the application. Filesystem access by the application is unimpaired in all of these Server PaaS offerings.

Both Engine Yard and Standing Cloud automate deployment and management of supported databases, primarily MySQL or PostgreSQL of particular versions. Both also allow database-level and server-level configuration changes, with certain limits. Engine Yard relies on Chef for database deployment, so configuration changes must be performed with a Chef recipe or by working around it. Standing Cloud requires that configuration changes be embedded in a post-deployment script and that the basic deployment structure (filenames and directory paths) remains unchanged. Unsupported databases can also be used by applications running in Engine Yard or Standing Cloud, but they must be deployed and managed separately.

Elastic Beanstalk always requires the developer to deploy and manage the database separately. This can be done within Amazon Web Services using their RDS (Relational Database Service), but it is a separate step that is outside of the PaaS proper. RightScale offers tools (server templates and scripts) that make deployment, integration, and management of the database easier and repeatable, but generally you are on your own. As compensation for the extra effort, of course, you gain complete flexibility.

You may have noticed that I covered these offerings in order of increasing flexibility. When you reached the point in this article where the constraints of the PaaS did not seem too onerous, that suggests a good place to start if you are considering building or moving your application to the cloud. Keep in mind that you may also have different needs for development and test than you do for production - in some cases, scale is crucial; in others, control - and these needs change over time. Because of the inevitability of change, flexibility is important. That's why I recommend that you avoid building an application using a database environment where you would be permanently locked in.

More Stories By Dave Jilk

Dave Jilk has an extensive business and technical background in both the software industry and the Internet. He currently serves as CEO of Standing Cloud, Inc., a Boulder-based provider of cloud-based application management solutions that he cofounded in 2009.

Dave is a serial software entrepreneur who also founded Wideforce Systems, a service similar to and pre-dating Amazon Mechanical Turk; and eCortex, a University of Colorado licensee that builds neural network brain models for defense and intelligence research programs. He was also CEO of Xaffire, Inc., a developer of web application management software; an Associate Partner at SOFTBANK Venture Capital (now Mobius); and CEO of GO Software, Inc.

Dave earned a Bachelor of Science degree in Computer Science from the Massachusetts Institute of Technology.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Cloud Expo Breaking News
SYS-CON Events announced today that ComputeNext Inc. will exhibit at SYS-CON's 10th International Cloud Expo, which will take place on June 11–14, 2012, at the Javits Center in New York City, New York. What’s the scope of your “single pane of glass”? If you’re a cloud architect wouldn’t you be better suited with a telescope than a magnifying glass? The ComputeNext marketplace and workload manager sprawls across public clouds, eliminating vendor and platform lock-in. A single point of payment a...
In this CEO Power Panel at the 10th International Cloud Expo, moderated by Cloud Expo Conference Chair Jeremy Geelan, leading executives in the Cloud Computing and Big Data space will be discussing such topics as: Is it just wishful thinking to depict the Cloud as more than just a technology solution? If not, then what concrete examples best demonstrate cloud computing as an engine of business value? Big Data has existed since the early days of computing; why, then, do you think there is such...
“One of the greatest challenges to security in the cloud is management,” noted David Meizlik, Vice President of Marketing at Dome9 Security, in this exclusive Q&A; with Cloud Expo Conference Chair Jeremy Geelan. “With cloud computing,” Meizlik explained, “the infrastructure is owned and maintained by a third party, so you can’t just walk down the hall to get to your infrastructure.” Cloud computing represents the advent of a global computing utility that transcends national boundaries. Is that w...
With Cloud Expo 2012 New York (10th Cloud Expo) now five weeks away, what better time to introduce you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference...
Information Security and Risk has become a top concern of IT organizations and consumers alike. Concern about inadequate Info Security remains the #1 obstacle to greater adoption of Cloud Computing, according to Intel’s research. The rapid growth of Mobile and IP-connected Embedded devices, Cloud Computing, Social Networks, and “Consumerization of IT” is being met with, and in some cases contributing to, an escalating number and complexity of Cyber-threats. Tenants of the cloud need the ability ...
With Cloud Expo 2012 New York (10th Cloud Expo) now five weeks away, what better time to introduce you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference... We have technical and strategy sessions for you every day from June 11 through June 14 dealing with every nook and cranny of Cloud Computing and Big Data, but what of those who are presenting? Who are they, where do they work, what else have they w...
The average business in 2012 will double its amount of data, with more than half of that increase occurring within the cloud. Aggregation of data is no longer the competitive lever in itself, but rather the distribution and commercialization of data across multiple platforms. In his session at the 10th International Cloud Expo, Drew Bartkiewicz, VP Strategy at Mashery and a data economist, will analyze the massive intersection of Big Data Economics and Managed APIs as a way for Cloud and Mobil...
What are the legal implications and consequences of cloud computing in the healthcare and high-tech sectors? What are the potential legal protections and solutions from the point of view of providers, suppliers and consumers? In his session at the 10th International Cloud Expo, Paul Rubell, a Partner at Meltzer Lippe, will discuss the federal mandates that will encourage “meaningful use” of EHR technology by 2015, and what those mandates will require executives to understand about cloud comput...
The elastic resources offered by cloud computing have created an exciting opportunity for applications to handle very large workloads. However, writing applications that span an elastic pool of virtual servers creates huge challenges for developers. How can these virtual servers easily and efficiently share application data while avoiding scaliability bottlenecks? The answer lies in using in-memory data grids (IMDGs) to provide a powerful, easy-to-use, and highly scalable storage layer. IMDGs ...
With Cloud Expo 2012 New York (10th Cloud Expo) now five weeks away, what better time to introduce you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference... We have technical and strategy sessions for you every day from June 11 through June 14 dealing with every nook and cranny of Cloud Computing and Big Data, but what of those who are presenting? Who are they, where do they work, what else have they w...