
By David Rosenstrauch | Article Rating: |
|
June 1, 2002 12:00 AM EDT | Reads: |
18,926 |
Apache Cocoon is one of the most interesting, innovative, and powerful platforms for dynamic content generation, though not as well known as the others. A subproject of the Apache XML project, Cocoon is one of the lesser-known offerings from the folks at the all-open-source Apache Software Foundation, having garnered less attention than some of its more popular cousins like Struts. But Cocoon is worth a look.
It's not just Cocoon's use of XML in content generation that makes it so interesting; it's how it uses XML. Cocoon's authors clearly have a deep experience with and an understanding of XML - what it is and isn't good for - and Cocoon's simple but powerful architecture reflects that experience. XML isn't used here just "because everyone's using it." Rather, Cocoon exploits XML's strength for separating content from presentation. (As we know now, the lack of that separation made it increasingly difficult to do Web page development in straight HTML.) The result is an innovative and powerful tool for content site developers.
This article will familiarize you with Cocoon and some of its related technologies: what it is, what it does, and how to start using it in your own development projects.
A basic understanding of the core concepts behind XML, SAX, and XSL (and, of course, HTML) is helpful when reading this article. Don't worry too much if you haven't worked with these technologies, though. I won't be delving too deeply into them and will try to make any examples easy to understand.
What Is Cocoon?
Although at heart Cocoon is "yet another dynamic content-generation platform" (technically putting it in competition with the many other content-generation technologies out there such ASP, JSP, PHP, and Struts), Cocoon adds some new twists to this category that make it stand out.
Cocoon's core difference is its use of XML throughout the content-generation process. Each request sent to the Cocoon framework is processed using the same three steps:
- Generate XML content (either statically or dynamically)
- Optionally transform it
- Format it for output
Another major plus of Cocoon's XML-orientation: it provides for excellent separation of content and presentation - that holy grail of software applications. Content is kept as presentation-free XML data for as long as possible during processing, and then formatted into the appropriate output format just before being returned to the user.
In fact, Cocoon strives for an even greater separation of concerns. Its philosophy is to look upon the process of content generation as three separate realms: content, logic, and style. This type of division makes a great deal of sense, especially when you consider that completely different teams of people are frequently assigned to each of these functions: logic to software developers, content to users and data entry staff, and style to graphic designers.
How Cocoon Was Hatched
Inception
Cocoon began its life in 1999 as a far less ambitious endeavor than it is now. Pioneered by Apache developer Stefano Mazzocchi, Cocoon was initially just a "proof of concept" - a servlet that used XML and XSLT transformations to generate its output.
Cocoon 1.0
By the time Cocoon made it to version 1.0, it had progressed into a full-fledged framework for XML content generation and was starting to receive a good deal of recognition and use by site developers.
As with all early-version software though, it's often difficult to foresee the potential usability problems that will crop up in practice while the software is still being developed. It's also difficult to envision how popular your application will be at such an early stage. Cocoon was no different. Version 1.0, although functional, had its usability hampered by design decisions made early on, most notably its reliance on the memory-intensive XML DOM architecture. (The SAX model, and the APIs and tools needed to use it, were still in their infancy at that point.)
Since Cocoon was proving to be quite popular, demands for new features and improved performance kept coming in. It soon became clear that the initial architecture was not adequate to address these issues.
Cocoon 2.0
Enter Cocoon v2.0. This version (released as alpha in March 2001, with the first production release completed in November) seems to be almost a complete rewrite of the application. It addresses the performance issues from version 1 as well as being a much cleaner architecture conceptually.
The first notable improvement is the substitution of the event-driven SAX XML standard for the memory-intensive DOM API. In addition to the improvements in memory efficiency and scalability, the SAX model also allows output to be generated incrementally. This provides a faster response time since a response page is returned little by little, rather than waiting until all processing is complete to return a page (as the DOM model required).
The second major improvement concerns the internal architecture of the Cocoon application. Originally structured using a Reactor design pattern, this structure apparently caused conceptual as well as implementation difficulties. Instead, version 2.0 substitutes a pipeline architecture (described later) that proved far more flexible to code as well as much clearer conceptually.
The result is a solid, well-tested, powerful, and more efficient framework for just about any type of content generation under the sun. At the same time it manages to elegantly achieve true separation of presentation and content.
In short, Cocoon really rocks!
How Does Cocoon Work?
Let's take a closer look at Cocoon and how it works, and see how you can put it to use in your own development.
A Servlet at Heart
Although Cocoon is a powerful framework for XML processing, it's just a servlet at heart. Its job is just like any other servlet's: to receive requests, process them, and then generate a response. Cocoon accomplishes this by taking each request, finding an appropriate "pipeline" to handle it, executing the pipeline, and returning in its response any output that the pipeline generated. The pipeline's function is to generate the response output for a particular request, using XML processing internally to accomplish this task.
The Pipeline Architecture
The pipeline, a simple and elegant paradigm, fits in extremely well with the XML SAX processing model.
A pipeline at its simplest consists of a sequence of the three core Cocoon components - generators, transformers, and serializers - arranged in a chain (see Figure 1). XML data (SAX events) is passed down the chain, with each component performing its own processing on the data as needed. At the end of the chain the events are serialized out to the response's OutputStream and returned to the client making the request.
Generators, Transformers, and Serializers
The first component in the chain is always the generator. The generator's job is to create the stream of XML events that will be fed through the rest of the pipeline. There are prebuilt generators available to create the XML events from a number of possible sources: an XML file on disk, an HTML file (the HTML is tidied up and turned into XHTML in order to be XML-compatible), a JSP page, an XSP page (more about XSP later), etc. In fact, there are over a dozen varieties of generators included with the Cocoon distribution. And you can easily create new ones if you need to generate events from a nonstandard source.
The last component in the chain is always the serializer. The serializer's job is to turn the stream of XML events into some form of output that will be returned in the response. Prebuilt serializers are available to create output in the most popular formats: XML, HTML, text, WML, an SVG image, and more. Again, over a dozen varieties of serializers are included with the Cocoon distribution and again you can easily roll your own to support just about any output format you like.
As an option, a sequence of one or more transformers can lie in between the generator and the serializer. Transformers allow the developer to manipulate the XML events coming down the pipeline - adding, removing, or modifying events as needed - before the serializer finally sends them back in the response.
The XSLT transformer is the most common - and most powerful - transformer. It runs an XSL stylesheet against the stream of XML events coming down the pipeline, allowing the developer to use the powerful XSLT language to transform the XML from pure data into styled output.
You can place multiple transformers in a row in the pipeline, each of which will operate on the XML events one at a time. This allows you to style the data incrementally, and can help keep your stylesheets smaller and simpler.
Although Cocoon uses several other types of components as well (which are beyond the scope of this article), these three components are the core of its architecture. Pretty simple, huh? But it sure is powerful! By assembling combinations of these core components - along with your own custom-built server pages and stylesheets - you can build pipelines to generate content from any data source you like, styled however you like, and rendered in whatever output format you like.
Putting It All Together
Let's look at a sample "Hello World" pipeline and see how this all ties together in practice.
Our "Hello World" pipeline will work as follows:
- Use the file generator to read XML from a file HelloWorld.xml
- Use the XSLT transformer to run a stylesheet Style.xsl against the XML data and translate it into formatted HTML
- Use the HTML serializer to return the resulting HTML page to the user
HelloWorld.xml
<?xml version="1.0"?>
<message text="Hello World"/>
The HelloWorld.xml file is very simple, consisting of a single node (<message>) that contains a single attribute (text).
The Style.xsl stylesheet (see Listing 1) is also very simple, consisting of only two formatting transformations. The first one (xsl:template match="/") is called when the XSLT processor begins processing the document. It generates the skeleton of an HTML page. The body of the page is left empty, however, except for the XSL instruction xsl:apply-templates. This instruction simply commands the XSL processor to begin processing any child nodes here, applying other templates as needed. The net effect of the instruction then is "transform any child nodes here."
In this case there's only one node in the XML file, a <message> node, and only one remaining template in the stylesheet (xsl:template match="message"), which is looking to match <message> nodes. Since the stylesheet's template matches the XML file's node, we'll perform the second transformation:
- Write a <h1> opening tag
- Write the value of the text attribute in the <message> node (in this case "Hello World")
- Write a </h1> closing tag
Once Cocoon executes this pipeline and performs this transformation, the result is the following HTML:
<html>
<head>
<meta http-equiv="Content..Type" content="text/html; charset=UTF..8">
<title>A Message From Cocoon</title>
</head>
<body>
<h1>Hello World</h1>
</body>
</html>
This HTML is then serialized using the HTML serializer (which takes care of the few incompatibilities between strongly formatted XML and loosely formatted HTML) and the whole thing is sent back in the response to end the request.
The Sitemap
Great! So how do we write the code for this pipeline?
Cocoon uses a file called sitemap to define all the pipelines in your application. The sitemap is just that, a map of your Cocoon Web site. It defines which pipeline will be run in response to each site request, and how exactly each pipeline will generate its response page.
The sitemap is written in, guess what? XML, just like everything else in Cocoon. Let's look at it piece by piece.
First, all sitemaps must contain the <map:sitemap> root element:
<?xml version="1.0"?>
<map:sitemap
xmlns:map="http://apache.org/cocoon/sitemap/1.0">
Then the sitemap lists which Cocoon components your site will use (see Listing 2). In this case we'll be using only three components:
- The file generator
- The XSL transformer
- The HTML serializer
We'll also have to define an additional component, a matcher, to get this sitemap to work. A matcher is used to match the URL that the user enters and route it to the appropriate pipeline. (We won't discuss matchers in this article though.)
Then we define the pipelines used in the site. In this case we have only one, our Hello World pipeline, which we will set up to be executed when a request arrives for page "HelloWorld.html".
The pipeline calls the file generator to read from the HelloWorld.xml file, then calls the XSL transformer to apply the Style.xsl stylesheet, and finally calls the HTML serializer to properly format the XML event stream as HTML.
Since earlier in the sitemap we defined each of these components to be the default of its type (see Listing 2), we can use a shortcut and not explicitly write which component we're using; Cocoon assumes we're using the default. (However, if we were calling a generator other than the file generator, a JSP page, for example, we would need to write something like <map:generate type="jsp" src="HelloWorld.jsp"/>.)
The full pipeline reads like this:
<map:pipelines>
<map:pipeline>
<map:match pattern="HelloWorld.html">
<map:generate src="HelloWorld.xml"/>
<map:transform src="Style.xsl"/>
<map:serialize/>
</map:match>
</map:pipeline>
</map:pipelines>
Finally, we write a closing tag for the root element:
</map:sitemap>
And that's it. Our complete Hello World sitemap reads like Listing 3.
Installing and Running Cocoon
How Do We Run This Site?
As mentioned earlier, Cocoon is just a servlet at heart. It
can be easily run on any servlet engine that supports version 2.2 or
later of the Servlet API. I've used it with Apache Tomcat, but it can
also be run on WebLogic, Resin, and many others, even on Microsoft
IIS (using ServletExec).
Installing Cocoon onto the server is pretty easy for most servlet engines and usually consists of the following:
More details and instructions for specific servlet engines can be found on the installation page at the Cocoon Web site: http://xml.apache.org/cocoon/installing/.
Once Cocoon has been installed, running it is just a matter of accessing a URL that's handled by the Cocoon servlet. When a request to such a URL is made, it is routed to the Cocoon servlet. Cocoon matches the URL against its sitemap and then executes the appropriate pipeline.
To run our Hello World site, we first need to take the sitemap we just wrote and overwrite the sample sitemap.xmap file that Cocoon provides us by default. Then we just point our browser to http://localhost:8080/cocoon/HelloWorld.html and - voilà Cocoon serves up our dynamically generated "Hello World" page.
The Power of Cocoon
Our "Hello World" pipeline is an extremely simple example.
However, it's not hard to see how applying these concepts can enable
us to create more complex sites with Cocoon.
Since Cocoon provides complete separation of content from style, you can take the same content and format it in many different ways. There's no need to create new logic or content in order to create different looks for your site. Just create a new stylesheet for each output format, and you can serve up completely different-looking sites from the same content.
How could this be useful in practice? Imagine the following possibilities, all of which can be accomplished with ease using Cocoon. You could create sites that serve out the same content formatted completely differently based on:
Clearly, Cocoon's ability to do dynamically styled page generation is a powerful tool for site designers!
XSP
Another key innovation to come out of the Cocoon project,
which I mentioned briefly above, is Extensible Server Pages (XSP).
Inspired by JSP, XSP provides all the power of JSP while removing one
of that technology's major drawbacks: the intermingling of content
and style.
As discussed earlier, Cocoon heavily stresses the separation of content, logic, and presentation. If there's one place that logic and presentation are often intermingled it's in server pages. By definition, both ASP and JSP freely intermix logic and presentation, i.e., source code and HTML. Although the use of beans and taglibs in JSP can minimize this to some extent, there's still inherently some intermingling of logic and presentation, due to the use of HTML.
Cocoon's solution is, once again, elegant: use XML instead of HTML in your server pages. Unlike HTML, XML is presentation-free; it's just data. So writing a server page using XML makes a lot more sense.
An XSP page therefore consists of XML data tags, along with intermingled logic (Java code). As with JSP, the Java logic (through the use of either embedded code or calls to external modules) dynamically creates the page to be output. The difference here is that, once again, presentation-free XML is what the logic will generate, not HTML.
All XSP pages are Cocoon generators - the source of XML events in a pipeline. Once the XSP page has executed and generated the appropriate XML stream, the stream is then typically styled and formatted using a Cocoon transformer (e.g., the XSLT transformer using an XSL stylesheet) into the appropriate output format (such as HTML).
Like JSP, XSP pages are compiled into Java code (and then eventually class files), and like JSP, XSP also provides support for tag libraries (often referred to as "logicsheets" in Cocoon). As JSP developers know, calling reusable tag libraries in your pages helps to keep them from becoming too filled up with Java code. Using tag libraries with XSP provides the same benefits.
XSP is too big a topic to discuss in more detail here. (It could easily fill up an entire article on its own.) This should be a good overview though, and you can refer to the Resources section if you'd like to read more about XSP.
A Cocoon Case Study
I recently used Cocoon to develop a site for a New York City
law firm. The project, its design, and some of my reasons for
choosing Cocoon are described here to help provide some insight as to
when you might want to choose Cocoon as a development platform on
your own projects.
The law firm was looking for a new piece of software to replace the ancient and inflexible software they were currently using and getting increasingly locked into (Microsoft Works...for DOS!). The system functionality was not terribly complex - a basic CRUD system (functionality for create, read, update, and delete) that would provide a user-friendly front end for the legal case files in their database. The entire application would consist of less than a dozen screens.
Although the head attorney was fairly computer-savvy (he had recently mocked up a prototype for the new system in MS Access), he was looking to me for technology recommendations and was happy to defer to my knowledge and experience.
My first recommendation to him was to choose a Web-based system over an application in MS Access. This was an easy decision for me, as there would be several benefits to be gained from a Web-based system, including ease of development, minimal training required for the rest of the staff, and no software installation or upgrade procedures needed.
But which Web development platform to choose was a bigger question. Off the top of my head JSP and Struts were the leading candidates, but I also wanted to consider some newer, more cutting-edge technologies as well. As I had heard of Cocoon before, and had worked heavily with XML on a recent project, I started reading up on Cocoon to see if it would be a good fit.
Cocoon's technology was intriguing, and my experience with XML enabled me to come up with an idea that I realized could save a good bit of development time. Since the screens were fairly simple and similar (just rows of fields from the database) I realized that I could design the GUI extremely quickly and easily by just mocking up the screens in XML (see Listing 4, a mock-up of a Web page in XML that will later be rendered using HTML tables).
Then switching hats and putting myself in "style" mode, I could turn all the screen mock-ups into Web pages just by writing a single stylesheet that would transform the XML into HTML. Each <page> tag could be transformed into a skeleton for an HTML page, each <section> tag could be transformed into an HTML table, and each <row> tag could become a row in the table. The idea appealed to me.
What finally clinched the decision to use Cocoon, however, was an additional requirement, one that I initially wasn't sure how to accomplish. Law firms, as we all know, generate reams of documents, and one of the reasons this firm had stuck with MS Works all these years was its ability to mail merge the information from the database into a document. If they were going to abandon Works, the new system would need to provide mail-merge functionality as well. At first, I had no idea how I would provide a mail merge in a Web-based system. But as I thought it through I began to formulate a plan.
First of all, I realized the best approach would be for the site to serve up the merged documents as a download from a Web page. This would be simple for the users. Most browsers' "open-attachment" functionality is automatically configured to launch the appropriate application when an attachment is opened, so each time the user generated a mail-merge document, the word processor (MS Word) would automatically launch and open to that document. This would work out quite nicely.
But how to do the mail merge itself? Although MS Word has a mail-merge utility, I felt it would be clumsy for the users. It would be much simpler for them if they could just click on a button labeled "create merged document" and have the document arrive with all the merge substitutions already done behind the scenes. What would be required to make this happen?
Having the site retrieve the appropriate data from the database was certainly easy enough. After that, I reasoned, the template document would need to be read in, the merge fields identified, and the actual data substituted in.
Reading an MS Word document was a tall order - I wasn't aware of any Java libraries that could do that. But what about an RTF (Rich Text Format) file? RTF, unlike the Word .doc format, was text-based and would be much easier to read. In fact, I could probably write a parser to do it. I read though the RTF spec and after spending a couple of hours with the JavaCC parser generator, I was able to successfully read RTF documents and find the mail-merge fields in them. I checked with the head attorney to make sure he didn't mind using RTF format instead of Word, and he didn't. As long as they could still use the MS Word application to edit the documents (which they could), he was fine with it.
That left me with the last bit: How to substitute the data retrieved from the database for the merge fields? Boy, I thought, it would be nice if there was some existing code that could already do this so I wouldn't have to write it from scratch. What type of software could I use to scan through a document for a particular piece of content and change the value of that content before outputting it again? Then it hit me: I could use an XSL stylesheet! XSL was built to easily handle tasks like this.
Suddenly the whole idea began to come together, and my decision to use Cocoon was clinched. I would write a new generator that would parse an RTF file and turn it into a stream of XML events. I would generate a stylesheet in response to each mail-merge request that substituted database values for the merge fields and I'd use the XSL transformer to apply this stylesheet. Then I would use the text serializer to write out the new mail-merged RTF file data, I'd set the appropriate MIME type for RTF documents ("application/rtf"), and the user would get served up a mail-merged RTF document. I put together another proof-of-concept and, sure enough, it worked.
Writing the application using Cocoon worked out well, though it did have its share of challenges. As the application was not particularly complex, the code was not particularly difficult. The biggest challenge, however, was getting up to speed in some of the new technologies I was using, primarily Cocoon and XSL. I wound up learning them incrementally, as needed, when I hit roadblocks in various pieces of the development. ("Hmmm. How do I do this in XSL?") The online documentation I found for XSL and Cocoon was helpful, as was Michael Kay's book, XSLT Programmer's Reference 2nd Edition. (And, of course, I also posted my share of "Help me!" messages to the Cocoon Users mailing list.)
The system was finally completed and installed in December 2001, and got a big thumbs-up from both the users and the head attorney. It is now used daily by the entire staff.
From my perspective, I give Cocoon a big thumbs-up. I chose it as the core technology for this project, and it accomplished everything I needed. I found that using XML and Cocoon on the project allowed me to deliver it faster, as well as helping out tremendously in the conception and design phase. While designing I was able to focus completely on what type of content I was going to display on each page and how it would be generated, and completely ignore all presentation and style concerns until a later time. I found this separation of concerns during design to be quite a refreshing change!
Cocoon Concerns
Although I found Cocoon to be an excellent technology for
site development, it's not without some drawbacks. You should be
aware of the following concerns when you're deciding whether to use
Cocoon in your development:
- It's still more difficult to find developers experienced in Cocoon and its related technologies than in more mainstream technologies like JSP or Struts.
- If it doesn't catch on, it may be difficult to continue enhancing and supporting applications built with it.
For More Info
We've only scratched the surface of the Cocoon technology
here. I've omitted a great deal of material for brevity's sake.
There's much more to Cocoon, and a detailed discussion could easily
fill a book. (Indeed, the Cocoon Developer's Handbook, by Sue
Spielman, is due out this year.)
If you're intrigued by what you've read here, I'd encourage you to start using Cocoon. There's no better way to learn than by hands-on development. The best way to approach Cocoon is to start with a small, simple site and build it up incrementally from there, learning as you go.
There's also loads of additional documentation available about Cocoon online, in fact, probably too much for a novice user. Again, I'd encourage you to approach it incrementally. Read a little at a time, learning more about each of the various components and techniques as you need them.
Resources
At the Cocoon Web Site
(
http://xml.apache.org/cocoon/):
XSP Tutorials:
Published June 1, 2002 Reads 18,926
Copyright © 2002 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By David Rosenstrauch
David Rosenstrauch has been a software developer for over 13 years, providing development and consulting services to Fortune 500 companies as well as start-ups. Previously specializing in mainframe software, David has focused exclusively on Java for the past 6 years. Some of his other areas of interest and expertise include XML/XSL, the JavaCC parser generator, the Apache Cocoon publishing framework, object-oriented software practices, design patterns, and Extreme Programming.
![]() Nov. 11, 2018 04:00 PM EST Reads: 3,170 |
By Pat Romanski Nov. 11, 2018 11:45 AM EST Reads: 2,284 |
By Elizabeth White Nov. 10, 2018 11:45 PM EST Reads: 2,064 |
By Pat Romanski Nov. 10, 2018 10:00 PM EST Reads: 3,206 |
By Pat Romanski Nov. 10, 2018 01:00 AM EST Reads: 2,941 |
By Pat Romanski Nov. 9, 2018 04:45 PM EST Reads: 2,282 |
By Yeshim Deniz Nov. 3, 2018 05:00 AM EDT Reads: 4,027 |
By Yeshim Deniz Nov. 2, 2018 03:00 PM EDT Reads: 3,210 |
By Elizabeth White ![]() Oct. 30, 2018 03:45 PM EDT Reads: 14,062 |
By Zakia Bouachraoui Oct. 30, 2018 11:45 AM EDT |